gbif / gbif-api

GBIF API
Apache License 2.0
27 stars 5 forks source link

[feature-request]: Implement habitat as a field in class Event #131

Open mpitblado opened 3 weeks ago

mpitblado commented 3 weeks ago

This request pertains to http://rs.tdwg.org/dwc/terms/habitat

Upon looking in src/main/java/org/gbif/api/model/event/Event.java and https://techdocs.gbif.org/en/openapi/v1/occurrence#/Searching%20occurrences/searchOccurrence it does not seem like habitat is a currently available field through the api, while it is supported by the ipt. I am submitting this request as we have users that have expressed a strong interest in being able to use this to query for records within their datasets.

Habitat is not available as an option in either the main gbif.org UI or the hosted portal UI. I am not sure if that would also need to be implemented if this request were to be accepted to complete the overall objective, or if those UI's pull all available fields exposed through the API automatically.

CecSve commented 2 weeks ago

One of the challenges with implementing search on the dwc:habitat field is that the values are not standardized (no controlled vocabulary exist), and implementing one would be difficult although we do have an open issue to do so.

Currently there are more than 13 million verbatim values in the habitat field and values tend to only be present in one dataset. The verbatim values are usually long-ish text strings, for example: ' Along dry stream running more or less South out of range. Sandstone.' ' Insufficient water/flow available to collect a field water quality measurement.' ' Insufficient water/flow available to collect a field water quality measurement.' 'A medium straight tree with unbuttressed bole, broad crown. Height 80 ft [24.5 m]. Bole 35 ft [10.5 m]. d.b.h. 20 ins' ' A small malformed, many stemmed tree, 20 feet [6 m] total height. Bark 1/8 thick, outer bark smooth, few large' 'Bastard' softwood scrub.'

I am submitting this request as we have users that have expressed a strong interest in being able to use this to query for records within their datasets.

Just so I am sure I understand you correctly - your user community would like to be able to filter their data based on habitat prior to downloading the data?

It is possible to get habitat data from the verbatim file of a full DwC-A download: https://techdocs.gbif.org/en/data-use/download-formats#dwca-verbatim, but that of course does not allow users to search for specific habitats.

Unless the field is standardized by a controlled vocabulary, I doubt it would make much sense to enable a search filter.

mpitblado commented 2 weeks ago

Hi CecSve,

Correct, our user community would like to be able to search records within a dataset based on a pattern for the field habitat. For example,

Habitat
Long Beach
Test Rock
Null Island

Could be queried on *Island* to return

Habitat
Null Island

Ultimately, my objective is to get the term included in the hosted portal interface, but my understanding is that before a term gets on the list of filters, it must first be available within the API, hence this request.

image

On gbif.org, a user cannot also currently search via habitat. Again my understanding is that in order to become available in that UI, it would also need to first be implemented through the API.

image

The intended search functionality would be identical to other non-standardized text fields, such as recordNumber, locality, waterBody etc. However, if habitat is not commonly queued and the compute required is judged as excessive, I understand.

CecSve commented 1 week ago

On gbif.org, a user cannot also currently search via habitat. Again my understanding is that in order to become available in that UI, it would also need to first be implemented through the API.

Yes, the field would have to be implemented in the back-end (API) so it can be used by our front-end and a hosted portal.

@fmendezh @MortenHofft could we do what is required to enable searches on habitat? @mpitblado correct me if I am wrong, but it has relevance for the The Beaty Biodiversity Museum hosted portal and it is mentioned several times in the upcoming freshwater publishing guide.

MortenHofft commented 6 days ago

Once in the API it is a small task to add in the UI. As I understand it it will behave similar to Verbatim scientific name.

mpitblado commented 4 days ago

Yes @CecSve that is correct, it definitely has relevance to some of the staff/users at the Beaty Biodiversity Museum, and perhaps others who will appreciate the option to search on it once available. As Morten mentions, a standard text search like Verbatim scientific name would be great and sufficient, does not need anything special beyond that.