catalyst-cooperative / pudl-archiver

A tool for capuring snapshots of public data sources and archiving them on Zenodo for programmatic use.
MIT License
4 stars 1 forks source link

"ValueError: I/O operation on closed file" uploading to Zenodo #81

Open zaneselvans opened 1 year ago

zaneselvans commented 1 year ago

Running the MSHA Mines archiver locally, I'm frequently getting I/O errors during upload, which don't seem to trigger a retry. This seems to be happening on the larger files (a couple 100 MB, by process of elimination), but I'm not sure which upload is actually failing.

I'm not sure if there as any issue with the file itself, as the temporary download directory is cleaned up at the end of the archiver run, but they're zipfiles, and so they should have been verified as valid zipfiles upon download.

2023-02-27 20:36:34 [    INFO] catalystcoop.pudl_archiver.depositors.zenodo:92 PUT https://sandbox.zenodo.org/api/files/a2ac85a6-1aa3-4339-9c07-51163bedffe9/mshamines-assessed_violations.zip - Uploading mshamines-assessed_violations.zip to bucket
2023-02-27 20:41:35 [    INFO] catalystcoop.pudl_archiver.utils:46 Error while executing <coroutine object ZenodoDepositor._make_requester.<locals>.requester.<locals>.run_request at 0x28dfa9f50> (try #1, retry in 10s):
Encountered exceptions, showing traceback for last one: ["('mshamines', ValueError('I/O operation on closed file'))"]
Traceback (most recent call last):
  File "/Users/zane/mambaforge/envs/pudl-cataloger/bin/pudl_archiver", line 8, in <module>
    sys.exit(main())
  File "/Users/zane/code/catalyst/pudl-archiver/src/pudl_archiver/cli.py", line 58, in main
    asyncio.run(archive_datasets(**vars(args)))
  File "/Users/zane/mambaforge/envs/pudl-cataloger/lib/python3.10/asyncio/runners.py", line 44, in run
    return loop.run_until_complete(main)
  File "/Users/zane/mambaforge/envs/pudl-cataloger/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete
    return future.result()
  File "/Users/zane/code/catalyst/pudl-archiver/src/pudl_archiver/__init__.py", line 81, in archive_datasets
    raise exceptions[-1][1]
  File "/Users/zane/code/catalyst/pudl-archiver/src/pudl_archiver/orchestrator.py", line 195, in run
    await self._apply_changes()
  File "/Users/zane/code/catalyst/pudl-archiver/src/pudl_archiver/orchestrator.py", line 258, in _apply_changes
    await self.depositor.create_file(
  File "/Users/zane/code/catalyst/pudl-archiver/src/pudl_archiver/depositors/zenodo.py", line 346, in create_file
    return await self.request(
  File "/Users/zane/code/catalyst/pudl-archiver/src/pudl_archiver/depositors/zenodo.py", line 104, in requester
    response = await retry_async(
  File "/Users/zane/code/catalyst/pudl-archiver/src/pudl_archiver/utils.py", line 41, in retry_async
    return await coro
  File "/Users/zane/code/catalyst/pudl-archiver/src/pudl_archiver/depositors/zenodo.py", line 95, in run_request
    response = await session.request(method, url, **kwargs)
  File "/Users/zane/mambaforge/envs/pudl-cataloger/lib/python3.10/site-packages/aiohttp/client.py", line 508, in _request
    req = self._request_class(
  File "/Users/zane/mambaforge/envs/pudl-cataloger/lib/python3.10/site-packages/aiohttp/client_reqrep.py", line 313, in __init__
    self.update_body_from_data(data)
  File "/Users/zane/mambaforge/envs/pudl-cataloger/lib/python3.10/site-packages/aiohttp/client_reqrep.py", line 517, in update_body_from_data
    size = body.size
  File "/Users/zane/mambaforge/envs/pudl-cataloger/lib/python3.10/site-packages/aiohttp/payload.py", line 379, in size
    return os.fstat(self._value.fileno()).st_size - self._value.tell()
ValueError: I/O operation on closed file
jdangerx commented 1 year ago

Scope:

Next steps: