IQSS / dataverse

Open source research data repository software
http://dataverse.org
Other
881 stars 492 forks source link

Extracting lat/long and insert into geospatial bounding box fields #9331

Closed pdurbin closed 1 year ago

pdurbin commented 1 year ago

@atrisovic and I exchanged emails with @plesubc and he has inspired us to use GDAL, ogrinfo or similar to try extracting latitude and longitude from a NetCDF file.

As this is just a spike, some discovery, we're sizing this as 10 or 1 day.

Here's part of the email from Paul (this file happens to span the entire globe):

"Metadata extraction is a relatively simple process assuming you’re using GDAL. The GDAL suite exports file metadata to stdout, so all you really need to do is capture and process the text. Of course, differing formats have differing outputs, because life is never that simple.

So, for example, imagine you downloaded a netcdf from here:

https://data.ceda.ac.uk/badc/ukmo-hadobs/data/derived/MOHC/HadOBS/HadEX3/v3-0-2 (HadEX3-0-2_cwd_ann_1901-2018.nc). This isn’t some special data set, it’s the result of a google for spatial netcdf files.

Basically, filtering this file through ogrinfo (one of the utilities in GDAL), you get something like this as output:

ogrinfo *nc
INFO: Open of `HadEX3-0-2_cwd_ann_1901-2018.nc'
      using driver `netCDF' successful.
Metadata:
  NC_GLOBAL#acknowledgement=RJHD was supported by Met Office Hadley Centre Climate Programme funded by BEIS and Defra
  NC_GLOBAL#CDI=Climate Data Interface version 1.9.9rc1 (https://mpimet.mpg.de/cdi)
  NC_GLOBAL#cdm_data_type=grid
  NC_GLOBAL#CDO=Climate Data Operators version 1.9.9rc1 (https://mpimet.mpg.de/cdo)
  [NC_GLOBAL#creator_email=robert.dunn@metoffice.gov.uk](mailto:NC_GLOBAL#creator_email=robert.dunn@metoffice.gov.uk)
  NC_GLOBAL#creator_name=Robert Dunn
  NC_GLOBAL#creator_url=[www.metoffice.gov.uk](http://www.metoffice.gov.uk/)
  NC_GLOBAL#dataset_version=3.0.2
  NC_GLOBAL#date_created=Mon Oct 26, 12:10 2020
  NC_GLOBAL#DOI=https://doi.org/10.1029/2019JD032263
  NC_GLOBAL#geospatial_lat_max=90
  NC_GLOBAL#geospatial_lat_min=-90
  NC_GLOBAL#geospatial_lat_resolution=1.25
  NC_GLOBAL#geospatial_lat_units=degrees
  NC_GLOBAL#geospatial_lon_max=360
  NC_GLOBAL#geospatial_lon_min=0
  NC_GLOBAL#geospatial_lon_resolution=1.875
  NC_GLOBAL#geospatial_lon_units=degrees
  NC_GLOBAL#institution=Met Office Hadley Centre, Exeter, UK
  NC_GLOBAL#keywords=extremes indices, gridded, temperature, precipitation, ETCCDI
  NC_GLOBAL#licence=HadEX3 is distributed under the Open Government Licence: http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/. The data are available for use with attribution to the data providers. Please cite Dunn et al (2020) and state the version used. This product may contain data which are governed by WMO Policy following WMO Resolution 40 Annex 1 alongside additional data that may have restrictions placed on their commercial use by the data owners. Any redistribution of this product should be accompanied by a similar statement of usage policy.
  NC_GLOBAL#Metadata_Conventions=Unidata Dataset Discovery v1.0,CF Discrete Sampling Geometries Conventions
  NC_GLOBAL#NCO=netCDF Operators version 4.7.5 (Homepage = http://nco.sf.net/, Code = http://github.com/nco/nco)
  NC_GLOBAL#processing_level=Daily TX, TN and P observations, converted to ETCCDI indices, and then gridded
  NC_GLOBAL#references=Dunn, Alexander et al. 2020, Journal of Geophysical Research - Atmospheres, https://doi.org/10.1029/2019JD032263
  NC_GLOBAL#source=HadEX3 data product
  NC_GLOBAL#summary=Gridded dataset of extremes indices
  NC_GLOBAL#time_coverage_end=2019-01-01T00:00Z
  NC_GLOBAL#time_coverage_resolution=Monthly
  NC_GLOBAL#time_coverage_start=1901-01-01T00:00Z
  NC_GLOBAL#title=CWD
pdurbin commented 1 year ago

@atrisovic and I just found a nice example that shows a specific bounding box:

  <attribute name="geospatial_lat_min" value="25.066666666666666" />
  <attribute name="geospatial_lat_max" value="49.40000000000000" />
  <attribute name="geospatial_lon_min" value="-124.7666666333333" />
  <attribute name="geospatial_lon_max" value="-67.058333300000015" />

This is from https://www.northwestknowledge.net/metdata/data/bi_2023.nc and is currently published at https://dev1.dataverse.org/file.xhtml?fileId=30&version=1.0

Next steps:

pdurbin commented 1 year ago

From our design doc, we'll look for geospatial files here too:

Use cases evident by using a variety of NetCDF/HDF5 data from these examples:

Surface PM2.5: https://sites.wustl.edu/acag/datasets/surface-pm2-5/#V5.GL.03 GridMET data: https://www.northwestknowledge.net/metdata/data/ Global Workshop on Earth Observation https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/OYBLGK NetCDF data from Harvard Dataverse: https://dataverse.harvard.edu/dataverse/harvard?q=*.nc

JR-1991 commented 1 year ago

Here is an example repository and Google Colab notebook to map NetCDF files to the current implementation of the geospatial metadata block using EasyDataverse. In order to use it, you need to supply your API Token and target collection name.

mreekie commented 1 year ago

Next sprint:

pdurbin commented 1 year ago

@JR-1991 thanks! I pointed your notebook at my test server and it pulled populated the bounding box:

Screen Shot 2023-04-03 at 4 13 38 PM

It's pretty straightforward to pull attributes out using the library we added in #9152.

public static Map<String, String> parseGeospatial(NetcdfFile netcdfFile) {
    Map<String, String> geoFields = new HashMap<>();

    Attribute westLongitude = netcdfFile.findGlobalAttribute(WEST_LONGITUDE_KEY);
    Attribute eastLongitude = netcdfFile.findGlobalAttribute(EAST_LONGITUDE_KEY);
    Attribute northLatitude = netcdfFile.findGlobalAttribute(NORTH_LATITUDE_KEY);
    Attribute southLatitude = netcdfFile.findGlobalAttribute(SOUTH_LATITUDE_KEY);

    geoFields.put(DatasetFieldConstant.westLongitude, getValue(westLongitude));
    geoFields.put(DatasetFieldConstant.eastLongitude, getValue(eastLongitude));
    geoFields.put(DatasetFieldConstant.northLatitude, getValue(northLatitude));
    geoFields.put(DatasetFieldConstant.southLatitude, getValue(southLatitude));

    System.out.println("https://linestrings.com/bbox/#"
            + geoFields.get(DatasetFieldConstant.westLongitude) + ","
            + geoFields.get(DatasetFieldConstant.southLatitude) + ","
            + geoFields.get(DatasetFieldConstant.eastLongitude) + ","
            + geoFields.get(DatasetFieldConstant.westLongitude)
    );

    return geoFields;
}

I think I got the order right to see the bounding box at https://linestrings.com/bbox/#-124.7666666333333,25.066666666666666,-67.058333300000015,-124.7666666333333

Screenshot 2023-04-03 at 16-54-05 https __linestrings com