Open ablack3 opened 1 week ago
I need to investigate and fix the failing tests.
Thanks for looking into this, another reason the duckdb based data examples are a nice direction to go in.
For the column order, I would suggest that the data files should match the order of the columns defined by the CDM specification, so would we rather update the data files to follow that column order as a fix?
For the column order, I would suggest that the data files should match the order of the columns defined by the CDM specification, so would we rather update the data files to follow that column order as a fix?
That would be my preference as well. So we require csv files to have columns in the same order specified by the CommonDataModel specification.
We have Eunomia CDM datasets stored in csv files. Currently the datatype of each column is not explicitly specified when reading in the data from csv which is causing #65.
In this PR I'm using the specification in the CommonDataModel package to be explicit about the datatypes when we read the csv files which should fix the issue. However this does mean that the column order matters.
I'm not sure if we consider column order (first, second, ect) part of the CDM specification but I noticed that in the GiBleed dataset the column order does not match the order in CommonDataModel specification csv. We can work around it and/or fix the file. It's a bit more tricky if we want to allow columns to be in any order but possible.