AMP-SCZ / utility

Storehouse for all utility scripts
Apache License 2.0
0 stars 4 forks source link

Pandas write out string `NA` as NaN #79

Open tashrifbillah opened 1 year ago

tashrifbillah commented 1 year ago

Pandas does a default NaN interpretation for strings like NA: RPMS:

In [20]: df1.loc['ME04106']['visit chrfigs_mother_d04'.split()]
Out[20]: 
visit                   1
chrfigs_mother_d04    NaN
Name: ME04106, dtype: object

So the files should be read as:

df1=pd.read_csv('PrescientStudy_Prescient_family_interview_for_genetic_studies_figs_01.06.2023.csv',dtype
    ...: =str, keep_default_na=False)

Those two arguments can retain the NA. We should modify rpms_to_redcap.py accordingly.

Credited to @owenborders @nickckim

tashrifbillah commented 1 year ago

https://pandas.pydata.org/docs/reference/api/pandas.read_csv.html