ESGF / esgf-download

ESGF data transfer and replication tool
https://esgf.github.io/esgf-download/
BSD 3-Clause "New" or "Revised" License
15 stars 2 forks source link

Invalid cross-device link? #16

Closed Zeitsperre closed 1 year ago

Zeitsperre commented 1 year ago

Not sure what's happening here, but seems to be failing when I try to download particular files as part of a query.

[2023-07-24 13:32:49]  ERROR     asyncio
an error occurred during closing of asynchronous generator <async_generator object AsyncClient.stream at 0x7f7a3a8a1620>
asyncgen: <async_generator object AsyncClient.stream at 0x7f7a3a8a1620>
Traceback (most recent call last):
  File "/home/me/mambaforge/envs/esgf/lib/python3.11/asyncio/runners.py", line 190, in run
    return runner.run(main)
           ^^^^^^^^^^^^^^^^
  File "/home/me/mambaforge/envs/esgf/lib/python3.11/asyncio/runners.py", line 118, in run
    return self._loop.run_until_complete(task)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/me/mambaforge/envs/esgf/lib/python3.11/asyncio/base_events.py", line 653, in run_until_complete
    return future.result()
           ^^^^^^^^^^^^^^^
  File "/home/me/mambaforge/envs/esgf/lib/python3.11/site-packages/esgpull/esgpull.py", line 392, in download
    async for result in self.iter_results(
  File "/home/me/mambaforge/envs/esgf/lib/python3.11/site-packages/esgpull/esgpull.py", line 309, in iter_results
    async for result in processor.process():
  File "/home/me/mambaforge/envs/esgf/lib/python3.11/site-packages/esgpull/processor.py", line 144, in process
    async for result in stream:
  File "/home/me/mambaforge/envs/esgf/lib/python3.11/site-packages/aiostream/aiter_utils.py", line 175, in __aexit__
    await self._aiterator.athrow(typ, value, traceback)
  File "/home/me/mambaforge/envs/esgf/lib/python3.11/site-packages/esgpull/processor.py", line 99, in stream
    yield Ok(ctx)
  File "/home/me/mambaforge/envs/esgf/lib/python3.11/site-packages/aiostream/stream/advanced.py", line 59, in base_combine
    result = task.result()
             ^^^^^^^^^^^^^
  File "/home/me/mambaforge/envs/esgf/lib/python3.11/site-packages/esgpull/processor.py", line 76, in stream
    async with (
  File "/home/me/mambaforge/envs/esgf/lib/python3.11/site-packages/esgpull/fs.py", line 106, in __aexit__
    self.tmp_path.rename(self.final_path)
  File "/home/me/mambaforge/envs/esgf/lib/python3.11/pathlib.py", line 1175, in rename
    os.rename(self, target)
OSError: [Errno 18] Invalid cross-device link: '/home/me/.esgpull/tmp/31d76d208039a080ecddf8eefbcd861c42946170.part' -> '/elsewhere/me/synda/data/CMIP6/CMIP/IPSL/IPSL-CM6A-LR/historical/r1i1p1f1/day/tasmax/gr/v20190614/tasmax_day_IPSL-CM6A-LR_historical_r1i1p1f1_gr_18500101-20141231.nc'

During handling of the above exception, another exception occurred:

RuntimeError: aclose(): asynchronous generator is already running
[2023-07-24 13:32:49]  DEBUG     root
Locals:
{
    'self': 
PosixPath('/home/me/.esgpull/tmp/31d76d208039a080ecddf8eefbcd861c42946170.part'),
    'target': 
PosixPath('/elsewhere/me/synda/data/CMIP6/CMIP/IPSL/IPSL-CM6A-LR/historical/r1i1p1f1/day/tasmax/
gr/v20190614/tasmax_day_IPSL-CM6A-LR_historical_r1i1p1f1_gr_18500101-20141231.nc')
}

[2023-07-24 13:32:49]  ERROR     root

Traceback (most recent call last):
  File "/home/me/mambaforge/envs/esgf/lib/python3.11/site-packages/esgpull/tui.py", line 154, in logging
    yield
  File "/home/me/mambaforge/envs/esgf/lib/python3.11/site-packages/esgpull/cli/download.py", line 60, in download
    files, errors = asyncio.run(coro)
                    ^^^^^^^^^^^^^^^^^
  File "/home/me/mambaforge/envs/esgf/lib/python3.11/asyncio/runners.py", line 190, in run
    return runner.run(main)
           ^^^^^^^^^^^^^^^^
  File "/home/me/mambaforge/envs/esgf/lib/python3.11/asyncio/runners.py", line 118, in run
    return self._loop.run_until_complete(task)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/me/mambaforge/envs/esgf/lib/python3.11/asyncio/base_events.py", line 653, in run_until_complete
    return future.result()
           ^^^^^^^^^^^^^^^
  File "/home/me/mambaforge/envs/esgf/lib/python3.11/site-packages/esgpull/esgpull.py", line 392, in download
    async for result in self.iter_results(
  File "/home/me/mambaforge/envs/esgf/lib/python3.11/site-packages/esgpull/esgpull.py", line 309, in iter_results
    async for result in processor.process():
  File "/home/me/mambaforge/envs/esgf/lib/python3.11/site-packages/esgpull/processor.py", line 144, in process
    async for result in stream:
  File "/home/me/mambaforge/envs/esgf/lib/python3.11/site-packages/aiostream/aiter_utils.py", line 175, in __aexit__
    await self._aiterator.athrow(typ, value, traceback)
  File "/home/me/mambaforge/envs/esgf/lib/python3.11/site-packages/esgpull/processor.py", line 99, in stream
    yield Ok(ctx)
  File "/home/me/mambaforge/envs/esgf/lib/python3.11/site-packages/aiostream/stream/advanced.py", line 59, in base_combine
    result = task.result()
             ^^^^^^^^^^^^^
  File "/home/me/mambaforge/envs/esgf/lib/python3.11/site-packages/esgpull/processor.py", line 76, in stream
    async with (
  File "/home/me/mambaforge/envs/esgf/lib/python3.11/site-packages/esgpull/fs.py", line 106, in __aexit__
    self.tmp_path.rename(self.final_path)
  File "/home/me/mambaforge/envs/esgf/lib/python3.11/pathlib.py", line 1175, in rename
    os.rename(self, target)
OSError: [Errno 18] Invalid cross-device link: '/home/me/.esgpull/tmp/31d76d208039a080ecddf8eefbcd861c42946170.part' -> '/elsewhere/me/synda/data/CMIP6/CMIP/IPSL/IPSL-CM6A-LR/historical/r1i1p1f1/day/tasmax/gr/v20190614/tasmax_day_IPSL-CM6A-LR_historical_r1i1p1f1_gr_18500101-20141231.nc'
svenrdz commented 1 year ago

The traceback hints towards the error happening after downloading a file, when trying to move it from a placeholder directory to its final location that follows the data reference syntax. From my understanding, this error comes from the usage of os.rename to move a file, which only works when the source and target destination are on the same filesystem.

Unfortunately there is no workaround within the current version of esgpull, but it should be relatively easy to fix for the next patch.

Zeitsperre commented 1 year ago

That is precisely the case: I failed to mention that /home and /elsewhere are using different file system architectures (EXT4 and ZFS, I believe).

svenrdz commented 1 year ago

This is fixed with https://github.com/ESGF/esgf-download/pull/24

Note that I only added a fallback that uses another (slower) method to move a file from the temporary directory to its final location.

Even though I said there was no workaround, there was actually a solution to your issue, which is still a valid one: having the temporary directory live on your /elsewhere filesystem, if that is possible. This command will do exactly that:

$ esgpull config paths.tmp /elsewhere/me/synda/tmp
Zeitsperre commented 1 year ago

Fantastic! I'll be sure to set that!