ASFHyP3 / hyp3-isce2

HyP3 plugin for ISCE2 processing
Apache License 2.0
11 stars 9 forks source link

hyp3lib.exceptions.OrbitDownloadError #208

Open scottyhq opened 4 months ago

scottyhq commented 4 months ago

The bug

Occasionally it seems some sort of network error leads to this traceback, which is a bit misleading. The orbit files do exist, the download just failed for some reason.

hyp3lib.exceptions.OrbitDownloadError: Unable to find a valid orbit file from providers: ('ESA', 'ASF')

To Reproduce

It's intermittent, so hard to reproduce. (hyp3-isce2 v1.0.1)

python -m hyp3_isce2 ++process insar_tops_burst \
      S1_292377_IW2_20190504T020036_VV_6F58-BURST \
      S1_292377_IW2_20190516T020037_VV_9045-BURST \
      --looks 20x4 \
      --apply-water-mask false 

# S1B_OPER_AUX_POEORB_OPOD_20210302T092716_V20190503T225942_20190505T005942.EOF
# S1B_OPER_AUX_POEORB_OPOD_20210302T141618_V20190515T225942_20190517T005942.EOF

Additional context

2024-05-08 17:59:48,223 - root - WARNING - Error encountered fetching AUX_POEORB orbit file from ESA; looking for another
2024-05-08 18:04:17,124 - root - WARNING - Error encountered fetching AUX_POEORB orbit file from ASF; looking for another
2024-05-08 18:04:18,505 - root - INFO - Downloading None
2024-05-08 18:04:19,177 - root - WARNING - Error encountered fetching AUX_RESORB orbit file from ESA; looking for another
2024-05-08 18:06:13,669 - root - INFO - Downloading None
2024-05-08 18:06:13,669 - root - WARNING - Error encountered fetching AUX_RESORB orbit file from ASF; looking for another
Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "/home/runner/micromamba/envs/hyp3-isce2/lib/python3.11/site-packages/hyp3_isce2/__main__.py", line 51, in <module>
    main()
  File "/home/runner/micromamba/envs/hyp3-isce2/lib/python3.11/site-packages/hyp3_isce2/__main__.py", line 47, in main
    sys.exit(process_entry_point.load()())
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/runner/micromamba/envs/hyp3-isce2/lib/python3.11/site-packages/hyp3_isce2/insar_tops_burst.py", line 530, in main
    insar_tops_burst(
  File "/home/runner/micromamba/envs/hyp3-isce2/lib/python3.11/site-packages/hyp3_isce2/insar_tops_burst.py", line 128, in insar_tops_burst
    downloadSentinelOrbitFile(granule, str(orbit_dir), esa_credentials=(esa_username, esa_password))
  File "/home/runner/micromamba/envs/hyp3-isce2/lib/python3.11/site-packages/hyp3lib/get_orb.py", line 189, in downloadSentinelOrbitFile
    raise OrbitDownloadError(f'Unable to find a valid orbit file from providers: {providers}')
hyp3lib.exceptions.OrbitDownloadError: Unable to find a valid orbit file from providers: ('ESA', 'ASF')
asjohnston-asf commented 4 months ago

Thanks for the report! We attempt to fetch S1 orbit files from the Copernicus Dataspace Ecosystem and the ASF archive at s1qc.asf.alaska.edu. Unfortunately, both of those systems are outage-prone, so we see intermittent OrbitDownloadErrors when fetches from both fail.

ASF has been monitoring the performance of s1qc.asf.alaska.edu for several months. We've observed that generating the index pages for resorb/poeorb files is particularly expensive. This can degrade server performance when many indexing requests come in at the same time (often driven by parallel processing systems like HyP3), resulting in longer response times and increased error rates for all requests to the server (both indexes requests and .EOF download requets).

@bbuechler et. al. implemented caching for the two index pages yesterday. We're hopeful this will eliminate these performance bottlenecks and greatly improve the reliability of orbit fetching from s1qc.asf.alaska.edu. Please let us know if you continue to see these errors at a higher-than-acceptable rate going forward.

asjohnston-asf commented 4 months ago

FYI for @cmarshak , ASF made a change to s1qc.asf.alaska.edu that we expect will reduce the rate of OrbitDownloadErrors for https://github.com/ACCESS-Cloud-Based-InSAR/DockerizedTopsApp and https://github.com/dbekaert/RAiDER as well.

cmarshak commented 4 months ago

@mgovorcin has noticed that our jobs are succeeding at >85% and was able to complete the processing of the CONUS West Coast.