gbif / literature-ws

Apache License 2.0
0 stars 1 forks source link

searching for websites in literature #33

Closed dnoesgaard closed 2 months ago

dnoesgaard commented 2 months ago

How can I search for content in the websites field?

None of these work: https://api.gbif.org/v1/literature/search?q=https://doi.org/10.1111/jbi.14969 https://api.gbif.org/v1/literature/search?q=%22https://doi.org/10.1111/jbi.14969%22 https://api.gbif.org/v1/literature/search?q=https%3A%2F%2Fdoi.org%2F10.1111%2Fjbi.14969

Fwiw, I get a hit when querying the ES endpoint directly using double-quotes, e.g.

/literature/_search?q="https://doi.org/10.1111/jbi.14969"

ahakanzn commented 2 months ago

Hi Daniel, this one works: https://api.gbif.org/v1/literature/search?DOI=10.1111/jbi.14969

dnoesgaard commented 2 months ago

Yup, but some papers don't have DOIs :( My example was a bad one...

ahakanzn commented 2 months ago

Oh okay I see, I'll take a look

ahakanzn commented 2 months ago

Fixed and deployed to prod

ahakanzn commented 2 months ago

public static final String REGEX_PUNCT_CHARS = "(\\p{Punct})"; changed to public static final String REGEX_PUNCT_CHARS = "([+\\-!(){}\\[\\]^\"~*?:\\\\/])";

Also we should write IT tests for literature-ws

dnoesgaard commented 2 months ago

Just for my clarity: Does this mean that all queries for anything starting with https:// only searches in websites?

ahakanzn commented 2 months ago

No it should work for other fields as well, for example: https://api.gbif.org/v1/literature/search?q=https://api.gbif.org/v1/occurrence/download/0182006-200613084148143 Returns 1 result which contains the url in the abstract field

dnoesgaard commented 2 months ago

Ah ok. But only queries starting with http(s) search in websites?