AtlasOfLivingAustralia / avh-hub

Australian Virtual Herbarium
https://avh.ala.org.au
Mozilla Public License 2.0
4 stars 2 forks source link

More problems with Advanced query form #69

Closed nielsklazenga closed 7 years ago

nielsklazenga commented 7 years ago

The reported issue was that this query: http://avh-test.ala.org.au/occurrences/search?q=text%3ASynaphea+petiolaris+collection_uid%3Aco75. which is produced by the Advanced query form, does not work.

I can see two problems with this query:

  1. Individual search terms are not enclosed, so the system can't see where one term ends and the other begins. That will break any query.
  2. There are no field names in the Text field; they should always be there, as we don't know which fields are searched automatically.
  3. http://avh.ala.org.au/occurrences/search?q=matched_name_children%3A%22Synaphea+petiolaris%22 , which is produced, but then without the necessary double quotes, when you use the Taxon name field, is not the query you expect to get when you type something into the Taxon name field and gives 0 results.

1 and 3 are issues in the query form, 2 is an issue with what has been filled in in the query form.

This works: http://avh-test.ala.org.au/occurrences/search?q=taxon_name:%22Synaphea%20petiolaris%22+AND+collection_uid:%22co65%22 and is a proper SOLR query, so that is what should come out of the query form.

So, three (four) simple things need to change on the query form:

  1. If it says it is going to search on Taxon name, it should search on taxon name and not something else ('matched_name_children').
  2. Values should be enclosed with double quotes (").
  3. We'll have to find a notation that makes the full text search work with the individual terms, as, if it is on the query form, it should work. I can actually get the query to work in the Advanced query form, http://avh-test.ala.org.au/occurrences/search?q=text%3A%28taxon_name%3A%22Synaphea+petiolaris%22+AND+collection_uid%3Aco65%29 (resulting from putting '(taxon_name:"Synaphea petiolaris" AND collection_uid:"co65")' in the Text field), and this, http://avh-test.ala.org.au/occurrences/search?q=text%3A%28taxon_name%3A%22Synaphea+petiolaris%22%29+collection_uid%3Aco65 (from typing '__(taxon_name:"Synaphea petiolaris")' in the Text field and selecting 'N.C.W. Beadle Herbarium__' from the Herbarium dropdown), works too, so this is just a matter of wrapping the contents of the Text field in parentheses.
  4. I would also very much prefer if individual fields could be glued together using ' AND ' rather than ' '; if nothing else, it makes the query string easier to read.
nickdos commented 7 years ago

The first example says the URL was produced from the advanced search page but doesn't say which fields were entered by the user, therefore its hard to reproduce.

The example provided appears to result from the user typing "Synaphea petiolaris" into the full text field and then selecting "WA Herbarium" from the herbarium drop-down, so the URL looks OK to me (see below)

  1. the text: field is the "default" field if no field is provided, therefore the terms don't need to be enclosed. E.g. http://avh-test.ala.org.au/occurrences/search?q=Synaphea+petiolaris+collection_uid%3Aco75 provides the same number of results (as the first URL in issue), as does http://avh-test.ala.org.au/occurrences/search?q=Synaphea+petiolaris&qc=data_hub_uid:dh9&fq=collection_uid%3A%22co75%22
  2. text is the field name, as explained above.
  3. This is a different search and the user has now pasted "Synaphea petiolaris" into the taxon name input (not full text). This is a bug, in that the matched_name_children field has not been indexed correctly in biocache-service. Fix is to use the taxon_name field, as suggested. There was a reason we implemented matched_name_children just for AVH many years ago (I can't remember why though), so it would be good to check that this OK to use taxon_name.

1 Covered above 2 Only the text: field omits quotes (which has no affect on results, as explained above), taxon name field already encloses pasted text in quotes, so changing field should fix the problem. 3 This doesn't make sense to me. I think you are confused by the fact that text: is a field, so it doesn't make sense embeding taxon_name: inside it - you can't embed fields inside fields. 4 Yes agree with this. SOLR allows you to set a default Boolean operator, which we have as AND but if we later change it to OR, then searches would work differently. Explicitly setting AND would prevent that and make it easier to read. Only minor issue is URLs get bigger and there is a potential to hit the max length in GET.

nickdos commented 7 years ago

I think the difference with taxon_name and matched_name_children might've been related to what child taxa are included in the results. the former will return all lower ranks including things like varieties and forms, where the latter did not (just a guess).

nielsklazenga commented 7 years ago

If that is the case, that's what we want, but the field name would be misleading. matched_name_children suggests to me that it will also include the children if, say, the search string is a family name (which is not part of the name of any of the children).

Basically what people expect is all taxa of which the name starts with the search string.

For everything else above, as long as it works I'm good (and please never change the default boolean operator to 'OR').

nickdos commented 7 years ago

taxon_name does a match to a name in our taxonomy and then just searches on that taxon. So keep it to use this?

E.g. http://avh-test.ala.org.au/occurrences/search?q=taxon_name:Acacia

nielsklazenga commented 7 years ago

We want it to do this: http://avh-test.ala.org.au/occurrences/search?q=taxon_name:Acacia*

nickdos commented 7 years ago

I've got an open issue to fix matched_name_children in biocache-service but in the meantime, I've hooked in the taxa param to the taxon name fields, which seems to do the right thing...

image

image

nielsklazenga commented 7 years ago

Yes, works beautifully now. I like that the taxa bit of the query is displayed in the search box now as well. Thanks for hanging in there.