ASAP-CRN / pmdbs-sc-rnaseq-wf

Repo for testing and developing a common postmortem-derived brain sequencing (PMDBS) workflow harmonized across ASAP
Apache License 2.0
1 stars 1 forks source link

metadata "NA" encoding #78

Open ergonyc opened 2 weeks ago

ergonyc commented 2 weeks ago

The ASAP CRN convention is to encode NULL values in metadata as "NA". We have wrappers in crn-utils (read_meta_table) to handle this, but the analysis scripts do NOT.

When you use pd.read_csv(), Pandas automatically interprets certain strings as missing values (NaN). These include: '' (empty string), 'NA, 'N/A, '#N/A, '#N/A N/A, 'NaN, 'nan, 'null, 'NULL, '-1.#IND, '-1.#QNAN, '1.#IND, '1.#QNAN, ', and 'None. To control this behavior: na_values: You can specify additional strings to be treated as NaN using the na_values parameter. keep_default_na: If you want to disable the default NaN recognition, set keep_default_na=False.

See also: https://github.com/ASAP-CRN/pmdbs-bulk-rnaseq-wf/issues/9#issue-2657377065