Closed ggalibert closed 2 years ago
Thanks Guillame - will check if the files can be reprocessed and if so will upload them. As a curiosity - what is the reported problem?
The problem is that these files are not sane NetCDFs. As you can see IMOS_ACORN_RV_20181213T100500Z_CSP_FV00_radial.nc is empty, some are truncated like IMOS_ACORN_RV_20190205T124500Z_NNB_FV00_radial.nc and some are not valid NetCDF at all.
The question is was it the upload operation that failed and left invalid files on our server or was the original file corrupted already (a problem happened during the generation of the file)?
interestingly it seems to only happen with the wera radials for which the conversion is managed through the python scripts. we'll check what the problem is. I can add a sanity check before launching the transfer process but need to fnd where the problem is
@scosoli 5 more files were uploaded corrupted on the 18 April from CSP and NNB. They have been added to the list above.
thanks for reporting that. I'm implementing a sanity check right now that will possibly get rid of this issue. will test this week
the sanity check is implemented - right before the rsync upload a script checks for I/O errors, missing variables and so on and if so the corrupt file is moved to an 'error' directory on our server for later checks. as it is now the script should be fairly robust but please let me know if corupt files are still uploaded and I'll fix it properly. as it is now it only acts on the python-generated radials
will reprocess and upload the missing files later on when I'm back from leave
Thank you, please let us know when you do re-upload the files.
1 more file was uploaded corrupted on the 27 May from GUI. It has been added to the list above.
@scosoli more files landed in the error directory today. They are either corrupted, or lacking pretty much everything.
will see what I can do. but it will have to wait as I have other priorities right now
we have started investigating what is going on - I have asked @badema to have a look and there's a couple of ongoing issues that need to be fixed
@scosoli, after the current communication outage we had since last night, a few more files landed in our incoming directory empty.
I am reprocessing and uploading the corrupt files. So far I have reprocessed with the matlab version of the RT scripts the files from CWI, and as far as I can see they seem to have landed successfully to the portal. please correct me if I am wrong and I'll investigate that more in detail
as far as I can see, all files that have been reported above as corrupt have been reprocessed and uploaded to the portal. please report any file I may have missed
@scosoli thank you for that. I can confirm that most files have been re-uploaded successfully except one. Unfortunately there has been some new corrupted files that landed 2 days ago. Please see above for the most recent list of corrupted files to be re-uploaded.
I can't see why the file would fail. can you provide more details?
this is what I use to transfer data from the RT queue to the incoming directory on your end:
bash rsync --password-file $path_to_password -ruv --remove-source-files ~/queued/*.nc acorn@incoming.aodn.org.au::acorn_staging >> ~/transfer_log
this is embedded in a very basic shell script used to set env. variables and paths to the password and similar. it is run via cronjob every 15 minutes to keep up with the data flow. the only thing I can possibly think of, is that a separate process runs calling rsync on a file that is then being removed by a previous call -- @scosoli
Taking one of the files as an example, it was first uploaded with a length of 291620 bytes and then exactly 15 minutes later a zero length file was uploaded.
$ grep IMOS_ACORN_RV_20190606T042000Z_CWI_FV00_radial.nc rsync_acorn_staging.log
2019/08/12 18:07:09 [17180] recv UNKNOWN [130.95.29.7] acorn_staging (acorn) IMOS_ACORN_RV_20190606T042000Z_CWI_FV00_radial.nc 291620
2019/08/12 18:22:09 [7244] recv UNKNOWN [130.95.29.7] acorn_staging (acorn) IMOS_ACORN_RV_20190606T042000Z_CWI_FV00_radial.nc 0
The first file was successfully published, with the expected file length:
$ aws s3 ls --no-sign s3://imos-data/IMOS/ACORN/radial/CWI/2019/06/06/IMOS_ACORN_RV_20190606T042000Z_CWI_FV00_radial.nc
2019-08-12 18:07:32 291620 IMOS_ACORN_RV_20190606T042000Z_CWI_FV00_radial.nc
The smoking gun is really that second upload of a zero length file, suggesting a script bug on your side.
A quick note to let you know that we have started investigating one of the possible sources of error / corruption in the netcdf creation stage. Badema has developed all the python scripts for the purpose and came across a HDF problem which we’ll try to solve. She will be providing further details on this - which seems to be occurring on a random basis and seems to be a known issue with netcdf creation stages. -- @scosoli
Another empty file came up yesterday. See updated list above.
Another corrupted file came up on Sun 20/10/2019. See updated list above.
Yes I was notified on my email too. we're having some annoying issues with CWI and we don't seem to be able to find the source. looks like an incompatibility with the cpci or some faulty cables
@scosoli, FYI: I'm trying to cleanup the backlog of errored files on the ACORN stream.
We still got the files mentioned above in the error directory. All of them are empty or invalid netcdf files.
The CWI files below are already published (from 2018) and failed because of the files being empty (0-bytes) /invalid netcdf files.
date | time | File name |
---|---|---|
2020-01-29 | 11:51:39.351823494 | IMOS_ACORN_RV_20181031T023000Z_CWI_FV00_radial.nc |
2020-01-29 | 11:51:46.711575334 | IMOS_ACORN_RV_20190605T160000Z_CWI_FV00_radial.nc |
2020-01-29 | 11:51:50.123460291 | IMOS_ACORN_RV_20190605T162000Z_CWI_FV00_radial.nc |
2020-01-29 | 11:51:53.555344574 | IMOS_ACORN_RV_20190605T172000Z_CWI_FV00_radial.nc |
2020-01-29 | 11:51:57.123224273 | IMOS_ACORN_RV_20190606T000000Z_CWI_FV00_radial.nc |
2020-01-29 | 11:52:00.791100600 | IMOS_ACORN_RV_20190606T042000Z_CWI_FV00_radial.nc |
2020-01-29 | 11:52:04.190985964 | IMOS_ACORN_RV_20190723T093000Z_CWI_FV00_radial.nc |
2020-01-29 | 11:52:07.614870520 | IMOS_ACORN_RV_20190723T113000Z_CWI_FV00_radial.nc |
2020-01-29 | 11:52:10.998756425 | IMOS_ACORN_RV_20190723T134000Z_CWI_FV00_radial.nc |
2020-01-29 | 11:52:18.278510974 | IMOS_ACORN_RV_20191019T220000Z_CWI_FV00_radial.nc |
2020-04-01 | 21:21:50.392496494 | IMOS_ACORN_RV_20200327T044000Z_CWI_FV00_radial.nc |
2020-04-01 | 20:21:54.177863730 | IMOS_ACORN_RV_20200327T121000Z_CWI_FV00_radial.nc |
The GUI/RRK files below aren't published and, although they failed a bit down the line (compliance-checker) they are not valid netcdf files ( I think the compliance checker cannot open then so they fail miserably).
date | time | File name |
---|---|---|
2020-01-29 | 11:52:14.614634509 | IMOS_ACORN_RV_20190903T061000Z_GUI_FV00_radial.nc |
2020-01-29 | 11:51:43.115696581 | IMOS_ACORN_RV_20181031T043000Z_RRK_FV00_radial.nc |
The following files have been sent to us in a corrupted state:
@scosoli they need to be manually re-uploaded if a sane version exists somewhere otherwise let us know and we'll move on.