Closed kyule closed 10 months ago
@sunray1 I also confirmed that this is only happening for TOOK. This also needs to be fixed ahead of the TOS Palooza in mid-January
Likely because we're using an indexed fulltext search to search locality strings (faster, but misses some results due to search rules) and the word "took" is a stop word for MyISAM. See: https://dev.mysql.com/doc/refman/8.0/en/fulltext-stopwords.html
@egbot - this will have to be removed and the index rebuilt. Unsure how that was built prior and I likely do not have the permissions to do this.
Related @kyule - Could be fine, but FYI:
Occurrences like https://biorepo.neonscience.org/portal/collections/editor/occurrenceeditor.php?occid=236992 will not show up in https://biorepo.neonscience.org/portal/collections/list.php?local=TOOL since the full searches only allow you to search by whole words.
Thanks for pointing that out; I definitely didn't remember that! It is an issue right now for older specimens. We improved the way we harvest locality information from NEON though, so anytime one of these records is reharvested it theoretically should reformat the locality field to make sure the siteID is in there. Just reharvested this one and TOOK is in there.
That's exactly it! As you state, locality is a MyISAM fulltext index. MyISAM Fulltext stopwords are defined within the MySQL file system (INNODB is stored in a table). Modifying this text file and restarting MySQL/MariaDb is one solution. As an alternative that will only affects the NEON portal, I modified the code to skip using the fulltext index when certain words are applied against the locality field. I already have a similar solution applied for collector field. See pull request below.
https://github.com/BioKIC/NEON-Biorepository/pull/378
When 3.1 rolls out, we are probably going to be switching over to a INNODB full text lookup, which uses a smaller list of stop words.
No samples show up when either the siteID "TOOK" is typed into the locality search or the TOOK - Toolik lake site is selected as a checkbox under D18. I have not checked many other site codes, but I haven't noticed the issue with other sites. If I type in the url https://biorepo.neonscience.org/portal/collections/list.php?datasetid=128 I get the correct results because it is dataset 128 should be TOOK samples, https://biorepo.neonscience.org/portal/collections/list.php?local=TOOK however brings up zero results.