GSA / data.gov

Main repository for the data.gov service
https://data.gov
Other
660 stars 102 forks source link

mimetype 'Geotiff' missing from the ckan resource_formats #4252

Closed Jin-Sun-tts closed 1 year ago

Jin-Sun-tts commented 1 year ago

There is an error when downloading datajson for National Renewable Energy Laboratory (NREL)

In the errorlog.txt:

Validation failed, best guess of error:

 'Geotiff' does not match '^[-\\w]+/[-\\w]+(\\.[-\\w]+)*([+][-\\w]+)?$'

Failed validating 'pattern' in schema[0]:
    {'pattern': '^[-\\w]+/[-\\w]+(\\.[-\\w]+)*([+][-\\w]+)?$',
     'type': 'string'}

On instance:
    'Geotiff'

For this dataset:

2023-03-23 14:18:43,168 - Dataset id=[0d41df0e-c33a-4291-ad14-dedb196c29fa], title=[Physical Solar Model version 3 Direct Normal Irradiance Multi-year Monthly Average], organization=[National Renewable Energy Laboratory] omitted, reason above.

How to reproduce

click Unredacted Inventory button

Expected behavior

data.json downloaded

Actual behavior

dataset with the format 'Geotiff' missing in the data.json file, errors in errorlog.txt

Sketch

As Geotiff not in the resource_formats, it just returns the original string like 'Geotiff', so it does not pass the regex match, as it expects something like xxx/xxx. So the errors show up.

create new file in ckanext-datajson which include additional mimetypes mapping, and add it to the list of resource_formats from ckan.

reference: https://www.digipres.org/formats/mime-types/

robert-bryson commented 1 year ago

It doesn't sound like the issue, but a friendly reminder for awareness that proxy does have a mime types list and geotiff is not on it.

Jin-Sun-tts commented 1 year ago

Thank you @robert-bryson, as Geotiff not in the resource_formats, it just returns the original string like 'Geotiff', so it does not pass the regex match, as it expects something like xxx/xxx. So the errors show up.

I am planning to add a list to take care of the missing formats, like return image/tiffif the format is Geotiff. Then image/tiff is a valid minetype in proxy list.

jbrown-xentity commented 1 year ago

The data issues that inventory has require much different fixes than the ones catalog has. In this instance, I think we shouldn't "change" or "fix" the data as it exists. Ideally we want to tighten up our data entry form, such that they can't input invalid entries (or add help text to minimize these errors). For a lot of catalog, it's not worth the overhead for someone to fix the metadata because we are 5-10 people removed from who actually input the data. In this case we can actually find out who entered the metadata, and/or tighten up our processes to remove this in the future (instead of trying to handle this specific case).

FuhuXia commented 1 year ago

Quick fix:

  1. Add Geotiff to the allowed list (as @Jin-Sun-tts 's proposed), or
  2. Instruct user to input image/tiff instead of Geotiff (as @jbrown-xentity proposed)

Long term fix:

  1. Improve the UI, make it a combo field that auto completes user's input as user is typing, in the meanwhile displaying a filtered list based on user's initial input with acceptable values where user can select.
Jin-Sun-tts commented 1 year ago

issue is fixed here: https://github.com/GSA/ckanext-datajson/pull/140