EOxServer / eoxserver

EOxServer is a Python application and framework for presenting Earth Observation (EO) data and metadata.
https://eoxserver.org
Other
40 stars 18 forks source link

GDAL Failed to parse NetCDF #569

Open tjellicoe-tpzuk opened 6 months ago

tjellicoe-tpzuk commented 6 months ago

Description

I am attempting to register a NetCDF for climate data using the registrar. The data registers correctly into the database, however when the registered data is selected to be rendered on the map I receive a 400 response and the error code: Failed to parse NETCDF: prefix string into expected 2, 3 or 4 fields. and the data is not displayed on the map. The issue seems to stem from the inclusion of the band index in the subdataset locator string when calling gdal.Open() to read the file.

Method to Reproduce

Steps to reproduce the behavior:

  1. Deploying EOxViewServer via Helm Chart with collections and product types set up for the selected climate data
  2. Using NetCDF data available in a STAC catalog here
  3. Registering the NetCDF file, either once copied into the cluster or from an S3 bucket, using the following CLI command: kubectl exec -it -n eoxviewserver deployment/eoxviewserver-registrar -- python3 /var/www/pvs/dev/pvs_instance/manage.py timeseries register --collection UKCP --storage data_s3 --path "clt_rcp85_land-cpm_uk_5km_01_day_20601201-20701130.nc" --product-type-name UKCPCLT --x-dim-name "projection_x_coordinate" --y-dim-name "projection_y_coordinate" --time-dim-name "time" --coverage-type-mapping "clt:UKCP_2060" --product-template "{collection_identifier}_{file_identifier}_{index}"
  4. The data registers correctly and can be viewed in the Django Admin pages
  5. On the Client page, select the appropriate layer and navigate to the spatial and temporal location of the data and I can see the outline for the selected data
  6. Selecting the correct browse type to request display of the ingested NetCDF data, data is not displayed on the map
  7. Using the 'inspect' functionality in my browser I can identify the WMS request sent by EOxViewServer to receive the climate data. Executing this request using Curl outputs the above Failed to Parse error.
  8. Looking in the Django Admin page, I am able to delete the index value from the 'Subdataset Locator' field and this causes the first timeslice (band 1) of the data to be correctly displayed onto the map. e.g. deleting the bold section in :clt :3599 .

Expected Behaviour

I would expect each individual time slice from the NetCDF to be rendered on the map when the timeline selection so requires. I would expect the subdataset locator to be used to identify which array from the NetCDF should be rendered at any selected time.

My Understanding

From looking at the source code, it seems the error is caused by the call to gdal.Open with the following string: NETCDF:"/vsis3/<bucket_name>/<netcdf_file_name>":<variable>:<index>. But gdal is not able to handle the additional index value appended to the string. If this functionality is not supported by gdal, perhaps instead it makes sense to load the entire dataset and then extract individual bands as required for example: ds = gdal.Open(NETCDF:"/vsis3/<bucket_name>/<netcdf_file_name>":<variable>")
ds = gdal.GetRasterBand(<index>).GetDataset() render(ds)

Desktop:

Additional Information

I also made an update to the contrib/gdal.py script on line 186 to ensure that the index value is not included when identifying the shape of the data, as this previously raised the same error and prevented the data being registered at all. I added the following to the code: path = path[:-2] if path[-2:] == ":0" else path, as this again lead to the same error being raised: Failed to parse NETCDF: prefix string into expected 2, 3 or 4 fields.