c-scale-community / use-case-aquamonitor

Apache License 2.0
2 stars 1 forks source link

Corrupt NetCDF files resulting from loading Landsat-8 data #18

Closed Jaapel closed 1 year ago

Jaapel commented 3 years ago

Intermittently, we run into: RuntimeError: NetCDF: HDF error when loading NetCDF errors downloaded from the VITO backend. When running the same code again, the error may go away without changing code. For an example error stack, check out https://github.com/c-scale-community/use-case-aquamonitor/commit/fc68cf38568db9b12e12be4f663fa76593b9dbe0 and check the cell outputs of one of the visualization steps.

I need help determining root causes, as it impedes progress. When running long workflows, we cannot have intermittent issues breaking multi-hour/day processing workflows.

Jaapel commented 3 years ago

I know my local internet provider sometimes loses some packets. I know I have had these issues both with the jobs API and with the DataCube.download() method. Can these methods handle packet loss?

jdries commented 3 years ago

Can you double check if this is not a client side issue? I have something very similar, where loading does work after a kernel restart. NetCDF libraries can be a bit buggy in this respect. In xarray, the used 'engine' is also an important parameter that affects this.

Data transfer in openEO happens over regular http protocols, using the Python requests library. There's nothing special in place, but you could try downloading the same file again to determine if this is the cause.

Jaapel commented 3 years ago

May be a client issue. I am not sure how to exactly test/troubleshoot these intermittent issues. A restart does not always work. Is this issue that you experience yourself notebook specific? Do you have any recommendation in terms of file-formats and drivers/engines to use? As this makes the experimentation phase difficult.

jdries commented 3 years ago

For the netCDF issue, if it is consistent when starting from a fresh kernel, and maybe switching the engine, than the file is probably corrupted indeed. The engine I use is 'h5netcdf'. I have been using netCDF myself a lot, so it's fine to try that, and it has some advantage over geotiff. I do fall back to geotiff where it's limitations are not really a problem, (no time dimension), because the geotiff writer is probably somewhat faster.

Jaapel commented 3 years ago

Thanks for the feedback. I will try to restart the kernel more often. Will report back if I get these issues more often.