SEE-GEO / ccic

Chalmers Cloud Ice Climatology
MIT License
1 stars 2 forks source link

Handle download errors #12

Closed adriaat closed 1 year ago

adriaat commented 1 year ago

The download from both the GridSat and GPM merged IR sources can raise errors in the download. Here are some examples:

ccic.bin.extract_training_data (ERROR     ) :: The following error was encountered while processing CloudSat granule '1599':
 503 Server Error: Service Unavailable for url: https://www.ncei.noaa.gov/data/geostationary-ir-channel-brightness-temperature-gridsat-b1/access/2006/GRIDSAT-B1.2006.08.16.03.v02r01.nc
ccic.bin.extract_training_data (ERROR     ) :: The following error was encountered while processing CloudSat granule '2042':
 HTTPSConnectionPool(host='www.ncei.noaa.gov', port=443): Max retries exceeded with url: /data/geostationary-ir-channel-brightness-temperature-gridsat-b1/access/2006/09/GRIDSAT-B1.2006.09.15.12.v02r01.nc (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7f34525648b0>: Failed to establish a new connection: [Errno 101] Network is unreachable'))
ccic.bin.extract_training_data (ERROR     ) :: The following error was encountered while processing CloudSat granule '12574':
 HTTPSConnectionPool(host='disc2.gesdisc.eosdis.nasa.gov', port=443): Max retries exceeded with url: /data/MERGED_IR/GPM_MERGIR.1/2008/251/merg_2008090721_4km-pixel.nc4 (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7facb7a4b340>: Failed to establish a new connection: [Errno 101] Network is unreachable'))
ccic.bin.extract_training_data (ERROR     ) :: The following error was encountered while processing CloudSat granule '12568':
 HTTPSConnectionPool(host='disc2.gesdisc.eosdis.nasa.gov', port=443): Max retries exceeded with url: /data/MERGED_IR/GPM_MERGIR.1/2008/251/ (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7facb7b4a940>: Failed to establish a new connection: [Errno 101] Network is unreachable'))

I believe there can be also download errors with the CloudSat data, but so far the only errors I have recorded were caused (and corrected) by myself, e.g. changing the IP from which I download or losing internet access.

To have an automatic bulk downloading+processing pipeline, these download issues should be corrected to maximize the amount of processed files. Correcting these issues should also be prioritized were the training data to be re-generated.

adriaat commented 1 year ago

Addressed with #15