minor tweaks needed to recover_zooniverse_metadata.py & filename_links.csv

alecristia commented 3 years ago

I think what you did works.

The only odd thing is that it seems the onset/offset are transformed (as well as age), perhaps because it's floating -- I get this warning (no error): sys:1: DtypeWarning: Columns (7,8) have mixed types.Specify dtype option on import or set low_memory=False.

Also, filename_links.csv only has 10 children. We are missing the other 10 children, with ids like 1111_1 (see demo_data.tsv). I wonder where you got the child-recording correspondence from? Perhaps from result_final_lisa.csv? (currently in files_from_elsewhere) That would be the best place to get it from, since it has all the children, and it's a file that cannot be generated from others. In any case, if that's where you took it from, I can generate a new version of filename_links.csv with ALL children (or even better integrate recording info into demo-data.tsv OR age and diagnosis info into filename_links.csv, so that all of that info is together, since it's overlapping).

lucasgautheron commented 3 years ago

I'll look into the warning !
onset/offset are changed to make them consistent with the format outputted by child-project zooniverse extract-chunks, which defines them as the actual onset and offset of a given chunk (not those of the vocalization event they were extracted from)
filename_links.csv is indeed the file I used, thanks for pointing that out. Don't bother, if it is not exhaustive, i'll update it myself.

lucasgautheron commented 3 years ago

8437612f3527a6ffcc21231f518cdccb4b428a31

LAAC-LSCP / zoo-babble-validation

minor tweaks needed to recover_zooniverse_metadata.py & filename_links.csv #3