Open peterdesmet opened 4 months ago
A couple notes on this --
I now have a more mature readr wrapper hosted here: https://kylehusmann.com/interlacer/
My implementation uses type_convert()
, which is beginning to stray from vroom
in its behavior, which is now the default importer for readr: https://github.com/tidyverse/readr/issues/1526 . So that's unfortunate...
The other downside is that it greedily loads all the data, instead of taking advantage of vroom's lazy load capabilities. There might be ways around this, but at the end of the day I think field-level missingness is something we want to see implemented in vroom proper. I've put a feature request in vroom to this effect: https://github.com/tidyverse/vroom/issues/532
It pains me to wait on vroom for this feature because I doubt it'll be high on their priority list, but I think that might actually be our limiting factor for implementing this in the most stable / predictable way... :'(
CHANGELOG: https://datapackage.org/overview/changelog/#fieldmissingvalues-new
Values defined in
field.missingValues
overwrite anything that is defined inschema.missingValues
(https://github.com/frictionlessdata/datapackage/pull/24). This is not straightforward to support in frictionless-r, since readr doesn't support it. But @khusmann has helpfully provided an implementation using wrappers for readr, see this comment and this implementation.Overall, I think this feature is best implemented together with categorical types.