Open rien333 opened 3 years ago
Another inconsistency is the fact that the columns of some files re not wrapped in quotes, while most are.
Take, for instance, dunbar2004gossip.csv:
Var1,Relation,Var2,Cor,Topic,Stage,Type,Confirmed,Notes,bibref
Again, this makes it somewhat more difficult/annoying to proces the raw data.
Thanks for this note, you're correct. I'm using R's default csv reader, which is more tolerant of this inconsistency. But I can run a script to normalise everything.
By the way, there are combined versions of the data as single csv files available here: https://github.com/CHIELDOnline/CHIELD/tree/master/data/db They are updated after every rebuild.
Thanks for all the work so far!
I've been playing around with the csv files in python, and some of my scripts produced weird results. Turns out my scripts tripped over the fact that some csv files have some (generally empty) columns that are not found across all csv files.
Take for example evolang11_41.csv:
Or atkinson2015speaker.csv:
Given that these columns are not used, nor discussed in the CHIELD paper, it seems best to delete them.
The source of my problem stems from the fact that I want to process the csv files as python dictionaries, with keys corresponding to a particular column, like so:
Specifying the
fieldnames
parameter ofDictReader
in this way fails, however, since it assumes that the columns are always the same across files.EDIT: my way around this is to not make any assumptions on the columns present in a file