Robsteranium / csvwr

Read and write CSV on the Web (csvw) tables and metadata in R
https://robsteranium.github.io/csvwr
15 stars 0 forks source link

Support for csvw spec 'separator' #8

Open SimonGreenhill opened 1 year ago

SimonGreenhill commented 1 year ago

csvw can define 'separators' in field definitions e.g.

                    {
                        "datatype": "string",
                        "propertyUrl": "http://cldf.clld.org/v1.0/terms.rdf#source",
                        "required": false,
                        "separator": ";",
                        "name": "Source"
                    }

...which means that the field should be parsed from "a;b" to something like c("a", "b"). It would be nice to support this.

Robsteranium commented 1 year ago

I agree this would be nice to have.

The complication is that we would need to handle all types, not just strings. We'd might want a list column whose values are all vectors of the relevant type.

It'd be nice to handle this with the call to read::read_csv which is responsible for parsing. This is done in c++ and I'm not sure how easy it'd be to extend this.

An alternative would be to read in all separated cells as strings then post-process them in R. This would be a lot slower of course.

This isn't something I expect to have time to work on but would gladly review a PR.

xrotwang commented 1 year ago

In our csvw python package I found that the various requirements of dialect specs (treating lines as comments, etc.) already precluded using python's csv standard library out-of-the-box. So for me, the "post-process separated strings in python" (slow) solution seemed unavoidable.