fathomnet / worms-server

Fast WoRMS name server
http://fathomnet.org:8888/docs/
MIT License
9 stars 1 forks source link

Filter for Marine/Brackish Taxa Only #5

Closed lauravchrobak closed 1 month ago

lauravchrobak commented 1 month ago

I am trying to use the worms module to find Marine/brackish species only and have found that my queries are not exclusive to marine taxa. For example, the following call worms.find_names_containing('Lumbricus terrestris') produces ['Lumbricus terrestris', 'Lumbricus terrestris lacteus', 'Lumbricus terrestris minor', 'Lumbricus terrestris platyurus', 'Lumbricus terrestris rubidus']. I am seeing that there is filter functionality on the WoRMS website. If you search for Lumbricus terrestris with 'Marine Brackish' only checked then earth worms are not found. Can we add functionality to filter for this to the fathomnet worms module?

lauravchrobak commented 1 month ago

I see that in the AphiaRecords Schemas of the REST API the following are fields: image

lauravchrobak commented 1 month ago

Perhaps just this line needs to be updated?

hohonuuli commented 1 month ago

I had originally tried filtering out non marine species but it broke things and I had to disable that filter. We could try to revisit it as our service doesn't care about non-marine species. See MutableWormsNode.scala#73

lauravchrobak commented 1 month ago

Ahh got it, yes I think this would be really useful functionality - would be worth checking with @kakanikatija to see if we want other subgroups like freshwater or brackish as well.

kakanikatija commented 1 month ago

for future-proofing, brackish and freshwater would be good to include.

hohonuuli commented 1 month ago

Tech note. The various flags in speciesprofile.txt appear to be applied inconsistently in the WoRMS data with values of 1, 0, and no value. have to run some testing with the no value fields and probably default a missing value to true to be safe.


taxonID isMarine    isFreshwater    isTerrestrial   isExtinct   isBrackish
urn:lsid:marinespecies.org:taxname:1            1   1   1       1
urn:lsid:marinespecies.org:taxname:2            1   1   1       1
[...]
urn:lsid:marinespecies.org:taxname:101472   1               
urn:lsid:marinespecies.org:taxname:101473   1               
urn:lsid:marinespecies.org:taxname:101474   1               
urn:lsid:marinespecies.org:taxname:101475   1               
urn:lsid:marinespecies.org:taxname:101476   1   0   0   0   0
urn:lsid:marinespecies.org:taxname:101477   1   0   0   0   0
urn:lsid:marinespecies.org:taxname:101478   1   0   0   0   0
urn:lsid:marinespecies.org:taxname:101479   1   0   0       0
urn:lsid:marinespecies.org:taxname:101480   1   1   0       1
urn:lsid:marinespecies.org:taxname:101481   1   0   0       0
urn:lsid:marinespecies.org:taxname:101482   1   0   0   0   0
urn:lsid:marinespecies.org:taxname:101483   1   0   0   0   0
urn:lsid:marinespecies.org:taxname:101484   1               
lauravchrobak commented 1 month ago

Hmm I'm a little concerned about having the no value fields default to true. In the example you provided there are a fair amount of no values in isExtinct and I have looked a few up here and they were all accepted (non extinct) taxa. Is it possible to populate them empty?

hohonuuli commented 1 month ago

@lauravchrobak Thanks for getting back to me. In the implementation I threw together I have empty values defaulting to false, not true. I'll look into returning a null value (e.g. "isBrackish": null) for empty values.

hohonuuli commented 1 month ago

Is it possible to populate them empty?

@lauravchrobak I've made the change you suggested. So as an example https://fathomnet.org/worms/details/Loligo%20opalescens would return the json below. Let me know if things look OK to you and I'll roll out the changes ASAP.

{
  "name": "Doryteuthis opalescens",
  "rank": "Species",
  "aphiaId": 574540,
  "parentAphiaId": 410349,
  "alternateNames": [
      "California market squid",
      "Californische pijlinktvis",
      "Doryteuthis (Amerigo) opalescens",
      "Loligo opalescens",
      "Loligo stearnsii",
      "Opalescent inshore squid",
      "Opalisierender Kalmar",
      "Pazifischer Opalkalmar",
      "Schließaugenkalmar",
      "calmar opale",
      "common Pacific squid",
      "opalescent inshore squid",
      "opalescent squid",
      "Калифорнийский кальмар"
  ],
  "isMarine": true,
  "isFreshwater": false,
  "isTerrestrial": false,
  "isExtinct": null,
  "isBrackish": false
}
lauravchrobak commented 1 month ago

LGTM! thanks Brian :)

hohonuuli commented 1 month ago

OK. Change is deployed to production (in release 0.6.0). I'll merge the pull request into main tomorrow.