catalyst-cooperative / pudl-archiver

A tool for capuring snapshots of public data sources and archiving them on Zenodo for programmatic use.
MIT License
4 stars 2 forks source link

Icebox: Move to new Zenodo API #183

Open jdangerx opened 8 months ago

jdangerx commented 8 months ago

See also https://github.com/catalyst-cooperative/pudl/issues/2939 -

Zenodo migrated to their new API, we need to use it if we want to keep creating new archives.

Some details on what the "correct endpoints" are: it seems like the basic "make a draft, edit it, publish it" flow is the same as before, which is good.

We have the existing operations defined on our ZenodoDepositor:

Sandbox changes:

zaneselvans commented 8 months ago

Automatically adding the catalyst-cooperative community would be great -- or maybe, add it if the previous version of the record was in the community. Right now the fact that a record is in a community doesn't seem to be inherited by subsequent versions which is very annoying, and leads to the most recent records often not showing up.

zaneselvans commented 8 months ago

I notice that we do not list frictionless as one of the dependencies in this module, but it does get installed, because we depend on catalystcoop.pudl which depends on catalystcoop.ferc-xbrl-extractor which depends on frictionless.

We also don't use frictionless in this package, even though we redefine (Pydantic) classes which mirror the Resource and DataPackage concepts. This seems like a recipe for ending up with outputs that look like data packages, but may or may not actually conform to the specification.

jdangerx commented 8 months ago

Ooooh yeah. I noticed that dependency funniness but did not connect the dots with "we don't actually use frictionless for data packages." We definitely should update that - could be a separate PR, though.

e-belfer commented 8 months ago

According to email communications with Zenodo this API will not be launched for another year and it's behavior is expected to change. So let's pause on this.