LM-SAL / aiapy

Python library for AIA data analysis
https://aiapy.readthedocs.io/en/stable/
BSD 3-Clause "New" or "Revised" License
5 stars 3 forks source link

Error table download failures #115

Closed nabobalis closed 8 months ago

nabobalis commented 1 year ago

In GitLab by @wtbarnes on Dec 9, 2022, 16:05

Since switching over to the alternate SSW mirror in !167, I've noticed that the downloads for the V3 error table files are failing. I don't understand why as the URL resolves just fine: https://sohoftp.nascom.nasa.gov/solarsoft/sdo/aia/response/aia_V3_error_table.txt

However, when trying to download an error table using aiapy.calibrate.util.get_error_table, I get the following exception,

WARNING: SunpyUserWarning: [Errno 2] No such file or directory: '/Users/wtbarnes/sunpy/data_manager/aiapy.aia_V3_error_table.txt' [sunpy.data.data_manager.cache]

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
Input In [2], in <cell line: 1>()
----> 1 error_table = aiapy.calibrate.util.get_error_table()

File ~/mambaforge/envs/mocksipipeline/lib/python3.9/site-packages/aiapy/calibrate/util.py:216, in get_error_table(error_table)
    214 def get_error_table(error_table=None):
    215     if error_table is None:
--> 216         error_table = fetch_error_table()
    217     if isinstance(error_table, (str, pathlib.Path)):
    218         table = astropy.io.ascii.read(error_table)

File ~/mambaforge/envs/mocksipipeline/lib/python3.9/site-packages/sunpy/data/data_manager/manager.py:85, in DataManager.require.<locals>.decorator.<locals>.wrapper(*args, **kwargs)
     80 if self._cache_has_file(urls):
     81     # If we can't find a file matching sha_hash, but the url is already
     82     # in the database
     83     raise ValueError(f"{urls} has already been downloaded, but no file "
     84                      f"matching the hash {sha_hash} can be found.")
---> 85 file_path = self._cache.download(urls, self._namespace)
     86 file_hash = hash_file(file_path)
     87 if file_hash != sha_hash:
     88     # the hash of the file downloaded does not match provided hash
     89     # this means the file has changed on the server.
     90     # the function should be updated to use the new
     91     # hash. Raise an error to notify.

File ~/mambaforge/envs/mocksipipeline/lib/python3.9/site-packages/sunpy/data/data_manager/cache.py:84, in Cache.download(self, urls, namespace, redownload)
     81     else:
     82         return Path(details['file_path'])
---> 84 file_path, file_hash, url = self._download_and_hash(urls, namespace)
     86 self._storage.store({
     87     'file_hash': file_hash,
     88     'file_path': str(file_path),
     89     'url': url,
     90     'time': datetime.now().isoformat(),
     91 })
     92 return file_path

File ~/mambaforge/envs/mocksipipeline/lib/python3.9/site-packages/sunpy/data/data_manager/cache.py:164, in Cache._download_and_hash(self, urls, namespace)
    162         errors.append(f"{e}")
    163 else:
--> 164     raise RuntimeError(errors)

RuntimeError: ["[Errno 2] No such file or directory: '/Users/wtbarnes/sunpy/data_manager/aiapy.aia_V3_error_table.txt'"]

The download fails and then the data manager cannot find the file (the confusing part of how this error message is handled is probably more of an upstream sunpy issue).

The error table test is also failing locally for me:

__________________________________________________________________________ test_error_table[None] ___________________________________________________________________________

self = <sunpy.data.data_manager.cache.Cache object at 0x110dfd300>, urls = ['https://sohoftp.nascom.nasa.gov/solarsoft/sdo/aia/response/aia_V3_error_table.txt']
namespace = 'aiapy.'

    def _download_and_hash(self, urls, namespace=''):
        """
        Downloads the file and returns the path, hash and url it used to download.

        Parameters
        ----------
        urls : `list`
            List of urls.

        Returns
        -------
        `str`, `str`, `str`
            Path, hash and URL of the file.
        """
        def download(url):
            path = self._cache_dir / (namespace + get_filename(urlopen(url), url))
            self._downloader.download(url, path)
            shahash = hash_file(path)
            return path, shahash, url

        errors = []
        for url in urls:
            try:
>               return download(url)

../../../mambaforge/envs/aiapy-dev/lib/python3.10/site-packages/sunpy/data/data_manager/cache.py:159:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

url = 'https://sohoftp.nascom.nasa.gov/solarsoft/sdo/aia/response/aia_V3_error_table.txt'

    def download(url):
        path = self._cache_dir / (namespace + get_filename(urlopen(url), url))
        self._downloader.download(url, path)
>       shahash = hash_file(path)

../../../mambaforge/envs/aiapy-dev/lib/python3.10/site-packages/sunpy/data/data_manager/cache.py:153:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

path = PosixPath('/Users/wtbarnes/sunpy/data_manager/aiapy.aia_V3_error_table.txt')

    def hash_file(path):
        """
        Returns the SHA-256 hash of a file.

        Parameters
        ----------
        path : `str`
            The path of the file to be hashed.

        Returns
        -------
        `str`
            SHA-256 hash of the file.

        References
        ----------
        * https://stackoverflow.com/a/22058673
        """
        BUF_SIZE = 65536
        sha256 = hashlib.sha256()

>       with open(path, 'rb') as f:
E       FileNotFoundError: [Errno 2] No such file or directory: '/Users/wtbarnes/sunpy/data_manager/aiapy.aia_V3_error_table.txt'

../../../mambaforge/envs/aiapy-dev/lib/python3.10/site-packages/sunpy/util/util.py:195: FileNotFoundError

During handling of the above exception, another exception occurred:

error_table = None

    @pytest.mark.parametrize(
        "error_table",
        [
            pytest.param(None, marks=pytest.mark.remote_data),
            get_test_filepath("aia_V3_error_table.txt"),
            error_table_local,
        ],
    )
    def test_error_table(error_table):
>       table = get_error_table(error_table)

aiapy/calibrate/tests/test_util.py:132:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
aiapy/calibrate/util.py:216: in get_error_table
    error_table = fetch_error_table()
../../../mambaforge/envs/aiapy-dev/lib/python3.10/site-packages/sunpy/data/data_manager/manager.py:85: in wrapper
    file_path = self._cache.download(urls, self._namespace)
../../../mambaforge/envs/aiapy-dev/lib/python3.10/site-packages/sunpy/data/data_manager/cache.py:84: in download
    file_path, file_hash, url = self._download_and_hash(urls, namespace)
../../../mambaforge/envs/aiapy-dev/lib/python3.10/site-packages/sunpy/data/data_manager/cache.py:161: in _download_and_hash
    warn_user(f"{e}")
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

msg = "[Errno 2] No such file or directory: '/Users/wtbarnes/sunpy/data_manager/aiapy.aia_V3_error_table.txt'", stacklevel = 1

    def warn_user(msg, stacklevel=1):
        """
        Raise a `SunpyUserWarning`.

        Parameters
        ----------
        msg : str
            Warning message.
        stacklevel : int
            This is interpreted relative to the call to this function,
            e.g. ``stacklevel=1`` (the default) sets the stack level in the
            code that calls this function.
        """
>       warnings.warn(msg, SunpyUserWarning, stacklevel + 1)
E       sunpy.util.exceptions.SunpyUserWarning: [Errno 2] No such file or directory: '/Users/wtbarnes/sunpy/data_manager/aiapy.aia_V3_error_table.txt'

../../../mambaforge/envs/aiapy-dev/lib/python3.10/site-packages/sunpy/util/exceptions.py:89: SunpyUserWarning

which seems to be the same failure that is causing the conda-forge build failures: https://dev.azure.com/conda-forge/feedstock-builds/_build/results?buildId=619803&view=logs&j=656edd35-690f-5c53-9ba3-09c10d0bea97&t=e5c8ab1d-8ff9-5cae-b332-e15ae582ed2d&l=631

nabobalis commented 1 year ago

In GitLab by @wtbarnes on Jan 10, 2023, 11:47

The reason this is failing is because the file on this SSW mirror has Content-encoding: gzip which cannot be decoded when downloading using parfive. This error is not propagated through when using the download manager, but can be seen when using parfive directly,

import parfive
dl = parfive.Downloader()
dl.enqueue_file('https://sohoftp.nascom.nasa.gov/solarsoft/sdo/aia/response/aia_V3_error_table.txt', path='.')
foo = dl.download()
print(foo.errors)

yields

1/0 files failed to download. Please check `.errors` for details
Files Downloaded:   0%|                                                                                                             | 0/1 [00:00<?, ?file/s]
[<parfive.results.Error object at 0x1072eb540>
https://sohoftp.nascom.nasa.gov/solarsoft/sdo/aia/response/aia_V3_error_table.txt,
400, message='Can not decode content-encoding: gzip']

It is not clear whether this is a problem with this file or a bug in parfive, but this was not an issue with the HESPERIA SSW mirror we were previously using.

See also this upstream issue on parfive: https://github.com/Cadair/parfive/issues/121

nabobalis commented 1 year ago

We have patched this and opened another issue to undo the fix when parfive is fixed.