Open jdangerx opened 5 months ago
I'm new to this issue but why not have our Resources be subclasses of the frictionless framework classes? I think we'll have 3 types of extensions we'd like to make:
I think it's probably better to have our Resource contain references to frictionless classes - roughly corresponding to the idea of preferring "composition over inheritance."
The benefit of this, to me, is that this reduces the coupling between our PudlResource
interface and the implementation of the frictionless classes it depends on. You'll only need to know the public frictionless APIs instead of the implementation details. But we would still be able to use the frictionless code to reduce duplication as necessary.
Once #1420 is closed, we will be using one version of
frictionless
across the board.Unfortunately, that version will be 4.40.8, which is missing critical SQL-describing functionality.
We should upgrade to version 5! Migration guide is here.
Fortunately, our existing usage of the
frictionless
libraries is very limited - just checking if it can parse a datapackage.json and then checking to see if the metadata is valid:ferc-xbrl-extractor/src/ferc-xbrl-extractor/xbrl.py
pudl/src/pudl/workspace/datastore.py
Which is pretty easy to change to the frictionless 5
Package.validate_descriptor()
method.Unfortunately, this is partly an artifact of us making mirror classes that look sort of like the frictionless classes, in
pudl-archiver
,ferc-xbrl-extractor
, andpudl
. These classes sort of replicate the frictionless classes, but also include lots of custom logic for our own purposes.My proposal is:
PudlResource
,XbrlResource
, andArchiverResource
, respectively (plus the Package/Schema/etc. equivalents)to_frictionless
methods that allow us to convert these to theirfrictionless
counterpartsfrictionless
classes to serializedatapackage.json
s that are valid, inferc-xbrl-extractor
/pudl-archiver
frictionless
classes to validate the datapackages inpudl/workspace/datastore.py
My guess for this work is ~20h.
Note - since our classes have required functionality that
frictionless
classes don't have, we should not implementfrom_frictionless
methods. For example, everyXbrlResource
has to respond toget_fact_tables()
, butfrictionless
has no such notion. So thenfrom_frictionless
can only ever make some half-functionalXbrlResource
- so we shouldn't do that at all.