Closed e-belfer closed 1 year ago
@zschira @jdangerx Flagging this bug as you're working on your refactoring PR.
Thanks for flagging this! I'll take a look. The first two dry runs needing production keys make sense. I think that we should not need the production key to update the dataset settings, so I'll look at how that is working...
I have this issue as well but I'm assuming that @e-belfer and I don't have access to the secrets for this repo. So maybe one of @zschira or @jdangerx should just send us the ZENODO_TOKEN_UPLOAD for now and we can set it as an environment variable.
Update to note that running pudl_archiver --datasets ferceqr --initialize --sandbox
with the sandbox tokens stored in .env returns the known field length error but otherwise does not prompt any token issues
@e-belfer do you have a branch that you've been working off of? Running this on main
tells me that it doesn't know what ferceqr
is.
Sweet, thanks! When I run pudl_archiver --datasets ferceqr --initialize --sandbox
I get the "no creators" issue that should be fixed with #42 . Is that what you're getting?
raise ZenodoClientException(
pudl_archiver.depositors.zenodo.ZenodoClientException: ZenodoClientException({'status': 400, 'message': 'Validation error.', 'errors': [{'field': 'metadata.creators', 'message': 'Shorter than minimum length 1.'}]})
If I rebase onto the small_fixes
branch that is in #42 and run I get:
> pudl_archiver --datasets ferceqr --initialize --sandbox
2023-01-26 16:00:37 [ INFO] catalystcoop.pudl_archiver.depositors.zenodo:90 POST https://sandbox.zenodo.org/api/deposit/depositions - Create new deposition
2023-01-26 16:00:39 [ INFO] catalystcoop.pudl_archiver.archivers.classes:75 Archiving ferceqr
2023-01-26 16:00:41 [ WARNING] catalystcoop.pudl_archiver.archivers.classes:155 The archiver couldn't find any hyperlinks that match re.compile('CSV_(\\d{4})_Q([1-4]).zip').Make sure your filter_pattern is correct or if the structure of the https://eqrreportviewer.ferc.gov/ page changed.
Encountered exceptions, showing traceback for last one: ["('ferceqr', AssertionError())"]
Traceback (most recent call last):
File "/Users/dazhong-catalyst/mambaforge/envs/pudl-archiver-toml/bin/pudl_archiver", line 8, in <module>
sys.exit(main())
File "/Users/dazhong-catalyst/work/pudl-archiver/src/pudl_archiver/cli.py", line 58, in main
asyncio.run(archive_datasets(**vars(args)))
File "/Users/dazhong-catalyst/mambaforge/envs/pudl-archiver-toml/lib/python3.10/asyncio/runners.py", line 44, in run
return loop.run_until_complete(main)
File "/Users/dazhong-catalyst/mambaforge/envs/pudl-archiver-toml/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete
return future.result()
File "/Users/dazhong-catalyst/work/pudl-archiver/src/pudl_archiver/__init__.py", line 99, in archive_datasets
raise exceptions[-1][1]
File "/Users/dazhong-catalyst/work/pudl-archiver/src/pudl_archiver/__init__.py", line 47, in archive_dataset
await archiver.create_archive()
File "/Users/dazhong-catalyst/work/pudl-archiver/src/pudl_archiver/archivers/classes.py", line 175, in create_archive
resource_info = await resource_coroutine
File "/Users/dazhong-catalyst/mambaforge/envs/pudl-archiver-toml/lib/python3.10/asyncio/tasks.py", line 571, in _wait_for_one
return f.result() # May raise f.exception().
File "/Users/dazhong-catalyst/work/pudl-archiver/src/pudl_archiver/archivers/ferc/ferceqr.py", line 49, in get_year_dbf
assert year >= 2012 and year <= 2002
AssertionError
Task exception was never retrieved
future: <Task finished name='Task-8' coro=<FercEQRArchiver.get_year_dbf() done, defined at /Users/dazhong-catalyst/work/pudl-archiver/src/pudl_archiver/archivers/ferc/ferceqr.py:45> exception=AssertionError()>
Traceback (most recent call last):
File "/Users/dazhong-catalyst/work/pudl-archiver/src/pudl_archiver/archivers/ferc/ferceqr.py", line 49, in get_year_dbf
assert year >= 2012 and year <= 2002
AssertionError
Task exception was never retrieved
future: <Task finished name='Task-9' coro=<FercEQRArchiver.get_year_dbf() done, defined at /Users/dazhong-catalyst/work/pudl-archiver/src/pudl_archiver/archivers/ferc/ferceqr.py:45> exception=AssertionError()>
Traceback (most recent call last):
File "/Users/dazhong-catalyst/work/pudl-archiver/src/pudl_archiver/archivers/ferc/ferceqr.py", line 49, in get_year_dbf
assert year >= 2012 and year <= 2002
AssertionError
Task exception was never retrieved
future: <Task finished name='Task-10' coro=<FercEQRArchiver.get_year_dbf() done, defined at /Users/dazhong-catalyst/work/pudl-archiver/src/pudl_archiver/archivers/ferc/ferceqr.py:45> exception=AssertionError()>
Traceback (most recent call last):
File "/Users/dazhong-catalyst/work/pudl-archiver/src/pudl_archiver/archivers/ferc/ferceqr.py", line 49, in get_year_dbf
assert year >= 2012 and year <= 2002
AssertionError
Task exception was never retrieved
future: <Task finished name='Task-11' coro=<FercEQRArchiver.get_year_dbf() done, defined at /Users/dazhong-catalyst/work/pudl-archiver/src/pudl_archiver/archivers/ferc/ferceqr.py:45> exception=AssertionError()>
Traceback (most recent call last):
File "/Users/dazhong-catalyst/work/pudl-archiver/src/pudl_archiver/archivers/ferc/ferceqr.py", line 49, in get_year_dbf
assert year >= 2012 and year <= 2002
AssertionError
Task exception was never retrieved
future: <Task finished name='Task-12' coro=<FercEQRArchiver.get_year_dbf() done, defined at /Users/dazhong-catalyst/work/pudl-archiver/src/pudl_archiver/archivers/ferc/ferceqr.py:45> exception=AssertionError()>
Traceback (most recent call last):
File "/Users/dazhong-catalyst/work/pudl-archiver/src/pudl_archiver/archivers/ferc/ferceqr.py", line 49, in get_year_dbf
assert year >= 2012 and year <= 2002
AssertionError
Task exception was never retrieved
future: <Task finished name='Task-14' coro=<FercEQRArchiver.get_year_dbf() done, defined at /Users/dazhong-catalyst/work/pudl-archiver/src/pudl_archiver/archivers/ferc/ferceqr.py:45> exception=AssertionError()>
Traceback (most recent call last):
File "/Users/dazhong-catalyst/work/pudl-archiver/src/pudl_archiver/archivers/ferc/ferceqr.py", line 49, in get_year_dbf
assert year >= 2012 and year <= 2002
AssertionError
Task exception was never retrieved
future: <Task finished name='Task-15' coro=<FercEQRArchiver.get_year_dbf() done, defined at /Users/dazhong-catalyst/work/pudl-archiver/src/pudl_archiver/archivers/ferc/ferceqr.py:45> exception=AssertionError()>
Traceback (most recent call last):
File "/Users/dazhong-catalyst/work/pudl-archiver/src/pudl_archiver/archivers/ferc/ferceqr.py", line 49, in get_year_dbf
assert year >= 2012 and year <= 2002
AssertionError
Task exception was never retrieved
future: <Task finished name='Task-16' coro=<FercEQRArchiver.get_year_dbf() done, defined at /Users/dazhong-catalyst/work/pudl-archiver/src/pudl_archiver/archivers/ferc/ferceqr.py:45> exception=AssertionError()>
Traceback (most recent call last):
File "/Users/dazhong-catalyst/work/pudl-archiver/src/pudl_archiver/archivers/ferc/ferceqr.py", line 49, in get_year_dbf
assert year >= 2012 and year <= 2002
AssertionError
Task exception was never retrieved
future: <Task finished name='Task-17' coro=<FercEQRArchiver.get_year_dbf() done, defined at /Users/dazhong-catalyst/work/pudl-archiver/src/pudl_archiver/archivers/ferc/ferceqr.py:45> exception=AssertionError()>
Traceback (most recent call last):
File "/Users/dazhong-catalyst/work/pudl-archiver/src/pudl_archiver/archivers/ferc/ferceqr.py", line 49, in get_year_dbf
assert year >= 2012 and year <= 2002
AssertionError
Great, let me rebase in the same way and keep fiddling around with it. Thanks!
FYI - #42 got merged so you're good to keep working off main
@e-belfer
@jdangerx @zschira This was working great before the big merge, but now I'm seeing that line 54 in entitites.py, return cls(name=contributor.title, affiliation=contributor.organization)
is returning AttributeError: 'dict' object has no attribute 'title
when I run --initialize --sandbox on the ferc eqr data.
I can take a look at it this afternoon - let me know if you want to pair!
@jdangerx https://github.com/jdangerx @zschira https://github.com/zschira This was working great before the big merge, but now I'm seeing that line 54 in entitites.py, return cls(name=contributor.title, affiliation=contributor.organization) is returning AttributeError: 'dict' object has no attribute 'title when I run --initialize --sandbox on the ferc eqr data.
— Reply to this email directly, view it on GitHub https://github.com/catalyst-cooperative/pudl-archiver/issues/38#issuecomment-1408740746, or unsubscribe https://github.com/notifications/unsubscribe-auth/AATBKMU25L5ULC6XXYXU36TWU7HKDANCNFSM6AAAAAAUHXKL2I . You are receiving this because you were mentioned.Message ID: @.***>
@e-belfer and @jdangerx it looks like what's on ferceqr
came from an earlier commit from the small-fixes
branch. On ferceqr
in entities.py
there's the line:
creators = [
DepositionCreator.from_contributor(CONTRIBUTORS["catalyst-cooperative"])
]
It should be:
creators = [
DepositionCreator.from_contributor(
Contributor.from_id("catalyst-cooperative")
)
]
This change should fix the issue. I think this is just a git issue, and if you get the latest from main
you should be good to go.
Absolutely, my error in rebasing. Fixed, thanks!
@jdangerx @zschira Are there any other remaining fundamental permissions issues that are not behaving as expected? If not, I'll go ahead and close out the issue.
I think this is behaving as expected!
Ran into some permissions issues trying to test a new source of data (FERC EQR) in the Zenodo sandbox.
pudl_archiver --datasets ferceqr --dry-run
requires the production token at present, returning: `KeyError: 'ZENODO_TOKEN_UPLOAD'pudl_archiver --datasets ferceqr --initialize
returns the same error, also requiring the production token.pudl_archiver --datasets ferceqr --sandbox
returns a KeyError for the dataset name in the Zenodo API client, breaking on the linesettings = self.dataset_settings[data_source_id]
in /zenodo/api_client.py.In other words, it seems to be impossible to test a new dataset (either locally or in the sandbox) without updating the dataset settings entry, which requires access to the production keys. This seems to be an undesireable outcome of some of the more recent refactoring changes.