catalyst-cooperative / pudl-archiver

A tool for capuring snapshots of public data sources and archiving them on Zenodo for programmatic use.
MIT License
4 stars 1 forks source link

Ensure each XBRL filing in FERC RSS feed has a stable ID #73

Closed zschira closed 10 months ago

zschira commented 1 year ago

The FERC RSS feed is changing guid's every time the feed is downloaded. According to the RSS standard this field is supposed to be a unique identifier of entries, which is why I chose to use it for naming the XBRL filings, but they don't seem to be conforming to this standard. I emailed FERC to check if they know this is happening, and if they plan to fix it, but I haven't heard back. While this is still the case, every time we run one of the FERC archivers, all of the XBRL file names will change and we will end up creating a new version even if none of the underlying data has changed.

We can fix this one of two ways:

  1. Do nothing and wait for FERC to fix this.
  2. Change how we name XBRL filings

Option 1 I think would be nice as it doesn't require any work and using UUID's is robust and won't lead to any accidental collisions. However, option 2 is probably the better option as we don't know if/when FERC will actually get to this.

zaneselvans commented 1 year ago

When a respondent submits a revision to and old filing they submit an entirely new filing, with both the old and the revised filing continuing to exist in the feed, right?

If there's a filing submission timestamp available in the RSS feed (or the XBRL) and a respondent ID, could we use those to uniquely identify a filing? E.g. something like... 20230201123456-C000123.xbrl This wouldn't preclude FERC from fixing the GUID. Have they been responsive in the past? Do they even have control over the system, or is it managed by XBRL-US I wonder?

zschira commented 1 year ago

Yeah there's an entirely new entry in the feed when a filer resubmits a filing, and they do contain a timestamp, so I think your suggestion seems good. They've been somewhat responsive in the past, but I still haven't heard back from them and I don't know who manages the feed. Also, the naming convention you've provided here will be unique and is more descriptive than the guid even if it was working correctly, so I'm in favor of just going forward with changing it.

zaneselvans commented 1 year ago

Each form has its own RSS feed right? So we don't need to worry about ID collisions between different forms?

jdangerx commented 10 months ago

We changed how we name things, so I think we can close this, right @zschira ?