frictionlessdata / frictionless-r

R package to read and write Frictionless Data Packages
https://docs.ropensci.org/frictionless/
Other
28 stars 10 forks source link

Support `field.missingValues` #174

Open peterdesmet opened 4 months ago

peterdesmet commented 4 months ago

CHANGELOG: https://datapackage.org/overview/changelog/#fieldmissingvalues-new

Values defined in field.missingValues overwrite anything that is defined in schema.missingValues (https://github.com/frictionlessdata/datapackage/pull/24). This is not straightforward to support in frictionless-r, since readr doesn't support it. But @khusmann has helpfully provided an implementation using wrappers for readr, see this comment and this implementation.

Overall, I think this feature is best implemented together with categorical types.

khusmann commented 3 months ago

A couple notes on this --

I now have a more mature readr wrapper hosted here: https://kylehusmann.com/interlacer/

My implementation uses type_convert(), which is beginning to stray from vroom in its behavior, which is now the default importer for readr: https://github.com/tidyverse/readr/issues/1526 . So that's unfortunate...

The other downside is that it greedily loads all the data, instead of taking advantage of vroom's lazy load capabilities. There might be ways around this, but at the end of the day I think field-level missingness is something we want to see implemented in vroom proper. I've put a feature request in vroom to this effect: https://github.com/tidyverse/vroom/issues/532

It pains me to wait on vroom for this feature because I doubt it'll be high on their priority list, but I think that might actually be our limiting factor for implementing this in the most stable / predictable way... :'(