OSC / phylogatr-web

The web app for the Phylogatr Project - https://phylogatr.org/
https://phylogatr.org/
MIT License
0 stars 0 forks source link

gbif expansion bug? #28

Closed johrstrom closed 2 years ago

johrstrom commented 2 years ago

This is the record in question. I happen to stumble upon it and wonder if it's being extracted correctly.

http://www.boldsystems.org/index.php/Public_RecordView?processid=SSKUA4857-15 | http://v4.boldsystems.org/index.php/Public_BarcodeCluster?clusteruri=BOLD:AAG2506| https://www.ncbi.nlm.nih.gov/nuccore/MF760650 1842699733 60.71 -137.43 Animalia Arthropoda Insecta Diptera Anthomyiidae Pegomya PRESERVED_SPECIMEN GEODETIC_DATUM_ASSUMED_WGS84;COORDINATE_PRECISION_INVALID L#14BIOBUS-0437 SSKUA4857-15 BIOUG27067-G07 2014-07-15T00:00:00

http://www.boldsystems.org/index.php/Public_RecordView?processid=SSKUA4857-15 | http://v4.boldsystems.org/index.php/Public_BarcodeCluster?clusteruri=BOLD:AAG2506| https://www.ncbi.nlm.nih.gov/nuccore/MF760650        1842699733      60.71   -137.43 Animalia        Arthropoda      Insecta Diptera Anthomyiidae    Pegomya                         PRESERVED_SPECIMEN      GEODETIC_DATUM_ASSUMED_WGS84;COORDINATE_PRECISION_INVALID       L#14BIOBUS-0437 SSKUA4857-15    BIOUG27067-G07  2014-07-15T00:00:00

This first field is being extracted through the regex to "MF760650" (the last URL which is a 404 by the way) through this bit of code. It's seems to be trying to extract an accession id, and I'm wondering if this is right.

https://github.com/OSC/phylogatr-web/blob/40504e83491626da3a2279304c7464f6ce21df58/app/models/gbif_genbank_linker.rb#L136

johrstrom commented 2 years ago

I think that it likely is being extracted correctly. Though why I thought it was an issue at the time I cannot say.