aodn / nrmn-application

A web application for collation, validation, and storage of all data obtained during surveys conducted by the NRMN
GNU General Public License v3.0
4 stars 3 forks source link

species_list not displaying some species #1280

Closed bpasquer closed 1 year ago

bpasquer commented 1 year ago

From email 2023-06-03: Just quickly, I have noticed an issue with the ep_species_list and subsequently the species_list_data views. For example, Apogon doederleini (observable_item_id 810) which has been superseded with Ostorhinchus doederleini (observable_item_id 8114) is not displaying in either endpoint. For some reason 810 is present in nrmn.observation, but 8114 isn’t, and because 810 has been superseded by 8114 – neither are being displayed in ep_species_list (or species_list_data on the geoserver). This is not a recent superseding, but we have noticed this change recently in the website species pages (i.e. there is now no record of this species). Has something changed in the way the species_list is being generated?

This issue doesn’t seem to be consistent, and I can’t really see why? The only thing I noticed was that Ostorhinchus doederleini has been imported into the Observable Items table from WoRMS and WoRMS has done another taxonomic reshuffle whereby all Phylum/Class/Orders seem to have changed for some species. I have manually changed the Phylum/Order/Class of Ostorhinchus doederleini back to be consistent with Apogon doederleini, and will check tomorrow if the species appears in the species_list layer. If it does (and the higher taxonomy is the cause of the species omission from the species_list endpoints) then I think we need to look at working out a periodical WoRMS scrape/update for these changes and apply them to relevant species consistently.

bpasquer commented 1 year ago

The reason why Apogon doederleini and Ostorhinchus doederleini are missing from ep_species_list is two-fold and explained by the condition in the code to create ep_species_list:

where oi.obs_item_type_name in ('Species', 'Undescribed Species')
and exists (select 1 from nrmn.observation obs where obs.observable_item_id = oi.observable_item_id)
and oi.superseded_by is null)

So:

The condition should check whether a species has ever been observed based on its current name, not on the observable_item_id

atcooper1 commented 1 year ago

Yes, @bpasquer the condition should be using current name not observable_item_id. It looks like all recent superseding corrections are not being reflected in the Reef Species of the World pages because of this.

Supersedings are displaying correctly/as expected in data endpoints.

bpasquer commented 1 year ago

actually the condition

and oi.superseded_by is null

in

where oi.obs_item_type_name in ('Species', 'Undescribed Species')
and exists (select 1 from nrmn.observation obs where obs.observable_item_id = oi.observable_item_id)
and oi.superseded_by is null)

is excluding all superseded species, although we do want to list superseded species as their current species name. So removing this condition allows the listing of Ostorhinchus doederleini in ep_species_list through the Apogon doederleini superseded record

atcooper1 commented 1 year ago

Thanks Bene, should this change be immediate in the endpoints? I can, now, see the correct change to observations I made yesterday to all Microcanthus strigatus observed on the east coast of Australia. I moved these across to M. joyaceae (I added M joyceae as a new species yesterday too), and I can see these observations in nrmn.observation as expected.

Although I just checked nrmn.observation in PGadmin and all new species names which I added yesterday to supersede (i.e. these should now be the 'current name') are still not displaying? For example Cheilodactylus fuscus (id 171) is now superseded by Morwong fuscus (id 8124) but 8124 is not visible in nrmn.observation?

bpasquer commented 1 year ago

Thanks Bene, should this change be immediate in the endpoints?

It need to be applied to the DB and the endpoint refreshed - so probably not happening today

Although I just checked nrmn.observation in PGadmin and all new species names which I added yesterday to supersede (i.e. these should now be the 'current name') are still not displaying? For example Cheilodactylus fuscus (id 171) is now superseded by Morwong fuscus (id 8124) but 8124 is not visible in nrmn.observation

The nrmn.observation contains the recorded species id only. You will not find observation from superseded species updated to their new parent species directly in the table observation. Because the superseding is only reflected in endpoints you're not going to find Morwong fuscus in observation.

But if I understand well what you did with Microcanthus strigatus and M. joyaceae is different because you actually corrected the observation (using the species tool I assume).,What surprises me though is that it took so long for the change to be reflected in the table. It should be immediate

bpasquer commented 1 year ago

@atcooper1 I investigated further the issue and here's what i found looking at ep_species_list:

Source Cheilodactylus fuscus Morwong fuscus
Geoserver(AODN prod) missing missing
NRMN prod missing missing
NRMN systest present missing

First, for this example the endpoints are identical in Geoserver and NRMN prod. The table shows that the results are consistent with the above explanation. In Systest, where neither the superseding or the new observable_item addition of Morwong has not been applied, _Cheilodactylus fuscus_is present. In Geoserver and NRMN prod, Cheilodactylus fuscus disappeared from ep_species_list after you superseded it by Morwong fuscus because of the condition excluding superseded species from this endpoint and Morwong fuscus is missing because it has not observation attached.

Also note that in Geoserver, Cheilodactylus fuscus is present in ep_species_survey and ep_species_survey_observation because following the requirements here https://github.com/aodn/NRMN/issues/185 and unlike ep_species_list 9https://github.com/aodn/NRMN/issues/178) these endpoints present species before superseding.

Following our discussion I have updated the PR to check for existing observations based on the current species name

atcooper1 commented 1 year ago

In addition to the above issues, It appears that the AODN Portal layers (M1, M2_inverts, M2_cryptics etc) are not refreshing with Prod. For example, all Microcanthus strigatus observations recorded from the east coast of Australia were changed to Microcanthus joyceae over 7 days ago. I can see these records have changed in Prod, but there are no M. joyceae records in the Portal M1 layers.

bpasquer commented 1 year ago

@atcooper1 the issue is not related to the update of portal layers. Some views like ep_M1_public have rules regarding taxonomic groups that are included. Because of the recent change in the species hierarchy in WoRMS, some taxonomic groups are not included in the endpoints. in this example: Microcanthus strigatus belongs to the class Actinoprerygii, while Microcanthus joyceae belongs to the class Teleostei. Looking at ep_M1_public it includes: Actinopterygii but not Teleostei. The view definition needs to be updated. More importantly we need a system to keep track of these taxonomic changes so that these rules are kept up-to date

atcooper1 commented 1 year ago

@bpasquer yes, i think there needs to be some automated scraping of WoRMS to maintain consistency, but ultimately, i'm not sure views should be based around inclusion rules, rather exclusion rules (to filter out unnecessary species)?

In addition to the 2 issues listed above, there also seems to be another problem with the superseding of species and the public species list views (which are currently affecting all our online taxonomic database users (GBIF, OBIS, ALA, RSoW). I'll attempt to convey the issue below, but it is a confusing one to pen out, so apologies if you have to read it 10 times and it still doesn't make sense! ep_species_list_data includes superseded_ids & superseded_names, indicating which IDs & names were superseded by the species, but because species IDs were changed by AODN there is now ambiguity with 'mapped_id': These data users are trying to use superseded_id to map the species list correctly, but this refers to species_id (new NRMN system), not mapped_id (old IMAS system). For example, Morwong fuscus has superseded_id=171, which appears to be a new NRMN ID (aka species_id). However 171 is Notolabrus gymnogenis (mapped_id). So there is an issue for these online databases, where superseded_ids can't be used for updates because the data for the IDs mapped from the old IMAS species list to those superseded_ids isn't carried across....

bpasquer commented 1 year ago

@atcooper1 True, IDs in the endpoint don't relate to the old IMAS system, which can be difficult to use for external users. To rectify that either IDs in public endpoints should be mapped to the old IMAS system, or new mapped variable should be added to the endpoints (which can be confusing for the general public).

atcooper1 commented 1 year ago

The problem is they can't be mapped back to the old IMAS system. Morwong fuscus has a superseded_id of 171, but species_id 171 returns null value in ep_species_list. Users end up in a null loop.

Currently the public endpoints and private endpoints are displaying very different data (due to the three errors identified above). Do the NRMN layers need to be hidden until these can be resolved? Which leaves us with a mapped variable option?

Hopefully these issues can be resolved before Lizzie and I showcase the NRMN data and products next week at AMSA?

bpasquer commented 1 year ago

When exactly are you presenting at AMSA? The issue with ep_species list will be resolve on Monday. I can edit the endpoint to add "Teleostei" wherever Actinopterygii were included also on monday.

The problem is they can't be mapped back to the old IMAS system. Morwong fuscus has a superseded_id of 171, but species_id 171 returns null value in ep_species_list. Users end up in a null loop. the issue with the mapping of Ids requires more thinking. I think it would work if both the superseded_id and the species_id were expressed in the mapped_ids (= IMAS old ids)