Closed Zeitsperre closed 3 years ago
My knowledge about CORDEX data is limited. Consequently, I'm not able to answer to your question yet. I have to investigate. First, I want to reproduce your use case. But, at the moment, I encounter a problem related to a certificate failure... I will give you the result of my analysis as soon as possible. Best regards, Patrice
The test I have just done is the following :
USE CASE with synda version 3.35
synda install http://cordexesg.dmi.dk/thredds/fileServer/cordex_general/CORDEX/output/NAM-22/OURANOS/MPI-M-MPI-ESM-LR/rcp85/r1i1p1/OURANOS-CRCM5/v1/day/tas/v20181107/tas_NAM-22_MPI-M-MPI-ESM-LR_rcp85_r1i1p1_OURANOS-CRCM5_v1_day_20860101-20901231.nc synda daemon start
RESULTS
1 / LOG INFORMATION (transfer.log)
2021-10-26 13:52:41,097 INFO SDDMDEFA-101 Transfer done (file_id=1,status=done,local_path=/synda/data/cordex/output/NAM-22/OURANOS/MPI-M-MPI-ESM-LR/rcp85/r1i1p1/CRCM5/v1/day/tas/v20181107/tas_NAM-22_MPI-M-MPI-ESM-LR_rcp85_r1i1p1_OURANOS-CRCM5_v1_day_20860101 -20901231.nc,url=http://cordexesg.dmi.dk/thredds/fileServer/cordex_general/CORDEX/output/NAM-22/OURANOS/MPI-M-MPI-ESM-LR/rcp85/r1i1p1/OURANOS-CRCM5/v1/day/tas/v20181107/tas_NAM-22_MPI-M-MPI-ESM-LR_rcp85_r1i1p1_OURANOS-CRCM5_v1_day_20860101-20901231.nc)
2 / DB EXTRACTED INFORMATION AFTER DOWNLOAD
{ 'file_id': 1, 'url': 'http://cordexesg.dmi.dk/thredds/fileServer/cordex_general/CORDEX/output/NAM-22/OURANOS/MPI-M-MPI-ESM-LR/rcp85/r1i1p1/OURANOS-CRCM5/v1/day/tas/v20181107/tas_NAM-22_MPI-M-MPI-ESM-LR_rcp85_r1i1p1_OURANOS-CRCM5_v1_day_20860101-20901231.nc', 'file_functional_id': 'cordex.output.NAM-22.OURANOS.MPI-M-MPI-ESM-LR.rcp85.r1i1p1.CRCM5.v1.day.tas.v20181107.tas_NAM-22_MPI-M-MPI-ESM-LR_rcp85_r1i1p1_OURANOS-CRCM5_v1_day_20860101-20901231.nc', 'filename': 'tas_NAM-22_MPI-M-MPI-ESM-LR_rcp85_r1i1p1_OURANOS-CRCM5_v1_day_20860101-20901231.nc', 'local_path': 'cordex/output/NAM-22/OURANOS/MPI-M-MPI-ESM-LR/rcp85/r1i1p1/CRCM5/v1/day/tas/v20181107/tas_NAM-22_MPI-M-MPI-ESM-LR_rcp85_r1i1p1_OURANOS-CRCM5_v1_day_20860101-20901231.nc', 'data_node': 'cordexesg.dmi.dk', 'checksum': '190fe842ff88999c2f0ceeb8fd5ed6d25e74ce5cc7d2c08a71d721252d451beb', 'checksum_type': 'sha256', 'duration': 11.576509, 'size': 358575377, 'rate': 30974396.253654707, 'start_date': '2021-10-26 13:52:26.753712', 'end_date': '2021-10-26 13:52:38.330221', 'crea_date': '2021-10-26 13:52:23.120995', 'status': 'done', 'error_msg': '', 'sdget_status': '0', 'sdget_error_msg': '', 'priority': 1000, 'tracking_id': None, 'model': None, 'project': 'CORDEX', 'variable': 'tas', 'last_access_date': None, 'dataset_id': 1, 'insertion_group_id': 1, 'timestamp': '2018-10-23T19:32:05Z', }
The test result is OK
Can you confirm that your use case gives you the same result today ?
Can you then precise your synda version and the selection file you use ?
Best regards, Patrice.
I've made 11 more tests.
async_http_timeout = 600 seconds (set into the sdt.conf file)
RESULTS
First test (detailed in my previous comment above) start_date = 2021-10-26 13:52:26.753712) duration = 11.576509
Other 11 tests
2021-10-26 15:08:22.471051 <= start_date <= 2021-10-26 15:37:06.183191 duration = 104.191526, 68.204497, 106.795432, 139.868149, 132.941996, 108.818363, 147.907556, 62.711311, 16.955667, 13.103844, 12.051813
ANALYSIS
We can see that the server response is not stable (min = 12.051813 seconds, max = 147.907556 seconds). Waiting time before download starts can be important (we assume that the expected effective download duration is around 11s).
The server behavior may explain your results...
I am going to investigate about the synda error message to see if there is a way to link it more clearly with the problem encountered for the case it was not a problem of checksum.
I hope this analysis may help you.
Patrice
From synda side, the checksum is calculated only when the size of the file is the same as expected.
So, the team would like to add this new item to the analysis, about your sentence : "The data is being downloaded into a ZFS-formatted disk with on-the-fly compression, but I don't imagine that would have an impact on the SHA sums."
The team expects that the ZFS-formatted disk may explain the errors encountered during the checksum control step.
I've been trying to download a fairly extensive selection of CORDEX data, and I am noticing that on average more than 90% of downloads are failing due to a mismatched checksum. The data is being downloaded into a ZFS-formatted disk with on-the-fly compression, but I don't imagine that would have an impact on the SHA sums. The internet connection speed is more than adequate.
These numbers seem oddly high, and it has me wondering if there are any options I should consider looking into. I'm not comfortable relaxing the checksum verification as this data is being used in production. Is there an issue with reported file checksums for CORDEX data on ESGF ?
An example output:
2021-10-08 16:40:05,042 INFO SDDMDEFA-102 Transfer failed (sdget_status=0,sdget_error_msg=,error_msg='File corruption detected: local checksum doesn't match remote checksum',file_id=22398,status=error,local_path=/{me}/{folders}/synda/downloads/cordex/output/NAM-22/OURANOS/MPI-M-MPI-ESM-LR/rcp85/r1i1p1/CRCM5/v1/day/tas/v20181107/tas_NAM-22_MPI-M-MPI-ESM-LR_rcp85_r1i1p1_OURANOS-CRCM5_v1_day_20860101-20901231.nc,url=http://cordexesg.dmi.dk/thredds/fileServer/cordex_general/CORDEX/output/NAM-22/OURANOS/MPI-M-MPI-ESM-LR/rcp85/r1i1p1/OURANOS-CRCM5/v1/day/tas/v20181107/tas_NAM-22_MPI-M-MPI-ESM-LR_rcp85_r1i1p1_OURANOS-CRCM5_v1_day_20860101-20901231.nc)
And a queue readout:
Thanks again!