Closed zaneselvans closed 4 years ago
Hey there,
I ran into this with a few tables with the Foreign Keys. I can provide further details if needed. Anything you need from my attempts at this just let me know.
I have also starting building code around the SQLite database as well. I've not finished but made some headway that way too.
Best, Bri
@briannacote Hi, yes, it would be really great if you provide some details
Cool. I had to update the datapackage.json file and remove the following foreign keys for the EIA data I was wanting to use (860 and 923). For give the basic-ness of this diff. The left hand is the original datapackage.json file created from the pudl process. The right hand side I had to remove certain foreign keys. Not all of them. I had to do a handful for the ferc .json file as well. I can send the actually files if you like?
}
Here's the files. I just realized these were made with my setup in particular. I hope this helps show what relationships were issues.
This is just from the EIA side (923 and 860)
https://drive.google.com/drive/folders/1DSZ8WWfQXFlXwyB9UIUORwZ1jNXVQmoZ?usp=sharing
I can send the ferc difference I saw as well. I didn't look at the CEMS data or the EPA IPM data.
Let me know if there's other items I can provide.
Thanks @briannacote I'll use it
@cmgosnell
@zaneselvans
The problems that these two fields from the generators_entity_eia
resource miss unique
constraint. It has a composite PK but it's not enough for PostgreSQL. Added this constraint and it works for me now with Postgres:
{
"name": "plant_id_eia",
"type": "integer",
"description": "EIA Plant Identification number. One to five digit numeric.",
"format": "default",
"constraints": {
"unique": true
}
},
{
"name": "generator_id",
"type": "string",
"description": "Generator identification number",
"format": "default",
"constraints": {
"unique": true
}
},
I have added a better error message to our code but I think we shouldn't alter incoming data packages adding additional unique/primary keys because it's tricky not to mess with initial metadata.
Should be an easy fix on the PUDL level.
Update
"foreignKeys": [
{
"fields": "plant_id_eia",
"reference": {
"resource": "generators_entity_eia",
"fields": "plant_id_eia"
}
},
{
"fields": "generator_id",
"reference": {
"resource": "generators_entity_eia",
"fields": "generator_id"
}
}
],
probably should be a composite FK
Thanks for catching this @roll! These should have definitely been composite FKs - and so should have several of our other FKs so I went through and cleaned them up. I've just checked in changes to the stored resource metadata that should fix this problem for you @briannacote. It worked fine for me, but please let me know if you run into problems. Sorry it took so long for us to get to the bottom of this.
It seems to work for me! I generated a data package with 2015-2017 data for EIA 860/923 + FERC Form 1, and was able to load it into both SQLite and PostgreSQL using datapackage.save()
without any mishaps (once I remembered to drop my old PUDL database...)
I think I'll go ahead and close this issue, but @briannacote please do let us know if something still isn't working on this...
This is awesome! I'll give it a go soon. I've also connected to the SQLite database as well. So we have options now over here. I'll let you know if I run into any issues again trying it with Postgres.
Also, forgive my delay in getting back. I was out of the office the last two weeks.
As @briannacote noted in #413, there's some issue which prevents the loading of even an already flattened / simple PUDL data package into a postgres database. @roll thinks this is something which should actually work, but needs a test case to debug it and fix it in the data package libraries.
We need to come up with a simple test case that demonstrates the "non-unique foreign key" issue.
The error which gets thrown:
Resulting from this SQL: