AMP-SCZ / utility

Storehouse for all utility scripts
Apache License 2.0
0 stars 4 forks source link

Store 0 instead of ' ' whenever possible #57

Open tashrifbillah opened 1 year ago

tashrifbillah commented 1 year ago

Hi @speroNCIRE , during combining subject-level EEG QC files, the columns in combined file inherits subject-columns' data type in order of precedence: float, int. On the other hand, NaNs are interpreted as float. What this means is that if there is a single NaN ( ) in a column of integers, the column gets transformed to float. This can display integers on DPdash with a trailing .0 . Please expand and see the effect of PA00705 on the entire nBridgedChan column:

nBridgedChan column ```python 0 12.0 1 2.0 2 13.0 3 6.0 4 10.0 5 2.0 6 0.0 7 0.0 8 0.0 9 0.0 10 0.0 11 0.0 12 0.0 13 0.0 14 0.0 15 0.0 16 17.0 17 0.0 18 32.0 19 12.0 20 9.0 21 13.0 22 9.0 23 8.0 24 8.0 25 18.0 26 0.0 27 NaN 28 10.0 29 0.0 30 6.0 31 2.0 32 2.0 33 0.0 34 0.0 35 2.0 36 0.0 37 7.0 38 2.0 39 0.0 40 0.0 41 NaN 42 0.0 Name: nBridgedChan, dtype: float64 ```
PA00705 file ```python reftime NaN day 28 timeofday NaN weekday NaN dTrialsMMN -640 dTrialsVOD -160 dTrialsAOD -200 dTrialsASSR -200 dTrialsRestEO -180 dTrialsRestEC -180 dZRangeLo 0 dZRangeHi 0 nHighZChan 0 nHighNoiseChan 0 nBridgedChan NaN dHitRateVOD 0 dHitRateAOD 0.0 FARateNovVOD 0 FARateNovAOD 18 FARateStdVOD 1.17188 FARateStdAOD 0.0 RTmedVOD 425.0 RTmedAOD 368 subject PA00705 Name: 27, dtype: object ```

To prevent the above, can you write 0 instead of empty ( ) whenever possible in the subject-level files?

tashrifbillah commented 1 year ago

Same problem is created by:

PHOENIX/PROTECTED/PronetLA/processed/LA07315/eeg/LA-LA07315-EEGquick-day1to8.csv
reftime,day,timeofday,weekday,Status,Rating
,8,,,1,NaN

For non-existent ratings, can we use -9 just as we have Unchecked: -9 score in the web app? https://github.com/AMP-SCZ/eeg-qc-dash/blob/6ab403a44e0fa77d6449a7cf9a96b62cd9795da6/app.py#L50-L56

speroNCIRE commented 1 year ago

Hi Tashrif,

I’m looping back to this. I understand the problem. My supporting programs generally use numeric NaNs for missing/unavailable data rather than assign some arbitrary code number and “NaN” (not empty) was actually getting written to my dpdash CSV files, e.g. in the PA00705 nBridgedChan instance you discovered below. In this case, I can change these to -1 since a negative number of bridged channels is impossible. Zero is a perfect score so I don’t want to use zeros when data is unavailable. For the dTrials* integer variables I’ll need some other solution as these can be legitimately negative. I could set them to an impossibly large negative value like -9999 but that seems awkward.

tashrifbillah commented 1 year ago
I can change these to -1 since a negative number of bridged channels is impossible. 

Zero is a perfect score so I don’t want to use zeros when data is unavailable. 

I could set them to an impossibly large negative value like -9999 but that seems awkward.

Hi Spero, I understand the issue with my proposed 0. Since we cannot use a single integer instead of ( ), we are probably better off retaining the ( ). It is not urgent. We can tolerate seeing an extra .0 after each such uncontrollably transformed integer.

speroNCIRE commented 1 year ago

I already fixed that, and am going through other inter-valued variables eliminating NaNs. There aren’t very many.