catalyst-cooperative / pudl

The Public Utility Data Liberation Project provides analysis-ready energy system data to climate advocates, researchers, policymakers, and journalists.
https://catalyst.coop/pudl
MIT License
456 stars 106 forks source link

do we need to be able to use skip_checks on specific tables? #394

Closed cmgosnell closed 3 years ago

cmgosnell commented 4 years ago

We were discussing this in Issue #364. Basically right now we have one table (fuel_receipts_costs_eia923) that we need to use the index to function as the auto-incrementing id column because there are duplicate records. If we could use skip_checks on a specific table we could disable the duplicate-row check and then we could get rid of this special case.

As Zane discussed a bit, I think we would need to be able to embed it into the metadata and it would need to be skipped without any processing while trying to validate the metadata so that anyone trying to use the datapackage could validate the package without needing to see and impement any special cases.

Right now, we have a work around for this, so it is not a high priority, but it would be useful for the future. @roll @zaneselvans

roll commented 4 years ago

@cmgosnell I've created a card on our side for it - https://github.com/frictionlessdata/pilot-catalyst/projects/1?fullscreen=true (16)