OpenDataServices / flatten-tool

Tools for generating CSV and other flat versions of the structured data
http://flatten-tool.readthedocs.io/en/latest/
MIT License
105 stars 15 forks source link

Discussion: Custom delimiters for unflatten #454

Open jpmckinney opened 3 months ago

jpmckinney commented 3 months ago

I notice that some CSVs uploaded to the OCDS Data Review Tool use semi-colons.

With commas:

ocid,id,date,tag,initiationType,tender/id
ocds-1234567-abc,ocds-1234567-abc-1,2000-01-02T00:00:00Z,tender,tender,abc

With semicolons:

ocid;id;date;tag;initiationType;tender/id
ocds-1234567-abc;ocds-1234567-abc-1;2000-01-02T00:00:00Z;tender;tender;abc

Some possible behaviors:

  1. Leave as is. With above example, field is read in as "ocid;id;date;tag;initiationType;tender" which shows up under additional fields.
  2. Allow a dialect to be passed in. This defers all responsibility to the calling code.
  3. Add a sniff boolean argument. If enabled, flatten-tool sniffs the dialect. The sample size and/or possible delimiters could also be passed in.

For CoVEs, flatten-tool's unflatten is called within lib-cove's convert_spreadsheet, which is called by a CoVE's view. The flattentool_options are derived from arguments to convert_spreadsheet – except for paths, encoding (utf-8-sig, cp1252, latin_1), metatab_vertical_orientation (True), convert_titles (True). So, whatever new arguments are added to unflatten will need to be added to convert_spreadsheet.

I think (2) is best, as it gives the most flexibility to the calling code.