dbca-wa / tsc

Threatened Species and Communities
https://dbca-wa.github.io/tsc/
MIT License
1 stars 0 forks source link

API filters have false positives #88

Open florianm opened 4 years ago

florianm commented 4 years ago

Problem

"All plant taxa" https://tsc.dbca.wa.gov.au/api/1/taxon/?paraphyletic_groups__icontains=21 contains e.g. taxon NameID 24734 (Calyptorhynchus latirostris (Carnaby)) https://tsc.dbca.wa.gov.au/admin/taxonomy/taxon/25755/change/.

Reproducible example

paraphyletic_groups__icontains = 21 should only return taxa with paraphyletic groups containing ID 21 (here plants).

tsc_taxa_flora <- "taxon" %>%
    wastdr::wastd_GET(query = list(paraphyletic_groups__icontains = 21)) %>%
    wastdr::wastd_parse() %>% # 33382 
    dplyr::mutate(taxon = name_id %>% as.character())

x <- tsc_taxa_flora %>% dplyr::filter(taxon == "24734")
x$paraphyletic_groups # 20, 12 (Animal, Bird)
x$pk # TSC PK 25755
alexrchapman commented 4 years ago

Hi Florian,

How's everything going at home and work? I see these wastd merge posts from time to time, and sometimes I have a glance at them.

The one below has me fascinated by the use of your term 'paraphyletic'. This has a very specific meaning in systematics to mean groups of taxa (eg. a Genus) where the species within it do not all have the same immediate common ancestor. Is this what you're meaning by this variable name? I'd suggest its dangerous to conflate names data structures with systematic concepts, as these change with the wind.

More generally, now with BIO getting off the ground, do you intend that what you're building be a replacement for WACENSUS?

Finally, I wonder if you'd be interested in my last couple of blog posts:

All the best, hope you're getting to play some music! Alex

Alex Chapman Research Associate — Western Australian Herbarium — DBCA Consulting Scientist — Gaia Resources — Environmental Technology Consultants m +61 410 384 330

On Mon, 8 Jun 2020 at 12:48, Florian Mayer notifications@github.com wrote:

Problem

"All plant taxa" https://tsc.dbca.wa.gov.au/api/1/taxon/?paraphyletic_groups__icontains=21 contains e.g. taxon NameID 24734 (Calyptorhynchus latirostris (Carnaby)) https://tsc.dbca.wa.gov.au/admin/taxonomy/taxon/25755/change/. Reproducible example

paraphyletic_groups__icontains = 21 should only return taxa with paraphyletic groups containing ID 21 (here plants).

tsc_taxa_flora <- "taxon" %>% wastdr::wastd_GET(query = list(paraphyletic_groups__icontains = 21)) %>% wastdr::wastd_parse() %>% # 33382 dplyr::mutate(taxon = name_id %>% as.character()) x <- tsc_taxa_flora %>% dplyr::filter(taxon == "24734")x$paraphyletic_groups # 20, 12 (Animal, Bird)x$pk # TSC PK 25755

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/dbca-wa/wastd/issues/288, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACIBGYALS5NKFRJS5E2CUNLRVRUSTANCNFSM4NX64QGQ .

florianm commented 4 years ago

Hi Alex,

great to hear from you! You are getting the email notifications because you have subscribed to the GitHub repo for our WA Sea Turtles DB and Threatened Species and Communities DB (TSC). For TSC I needed a local and performant copy of taxonomic data, which is not a replacement for WACENSUS. Safe to ignore any "WACENSUS"y noise coming from WAStD/TSC!

"paraphyletic_groups" as defined here are a direct copy of WACENSUS's "HbvSupra". WACENSUS's data model links HbvSupra as many to many to HbvNames, which allows supra groups to be paraphyletic. Therefore I played it safe by naming these many to many links loosely "paraphyletic_groups" - accepting that some might not be paraphyletic.

In the front-end we call them plainly "groups" and the use cases are driven by and limited to the given (nicely self-explanatory) groups. Thanks still for highlighting a possible pitfall - what immediate risks can you see?

alexrchapman commented 4 years ago

Yes, I assumed that was the case - sorta happy you kept me on the list :-)

Thanks for the explanation. While is true that some major groups (eg. Dicotyledons, Aves) are now considered paraphyletic (the former due to the Magnoliales; the latter by Dinosauria), most are not, especially the more common groups (Animalia, Plantae, Fungi).

So it is a poor choice of terminology that could be misinterpreted in the future. This is why the more non-specific supra-group was utilised (as hinted at by the use of “possibly”.

And also because these groupings are more well-known to the non-specialist. Ie. a systematist or taxonomist will understand the status of the supra-group name used, but rarely vice-versa. Which is why you used the very general term groups in the front end.

In short, it is really a misleading misnomer that can only lead to (programmatic) confusion down the track.

Cheers, A.

On Mon, 8 Jun 2020 at 1:56 pm, Florian Mayer notifications@github.com wrote:

Hi Alex,

great to hear from you! You are getting the email notifications because you have subscribed to the GitHub repo for our WA Sea Turtles DB and Threatened Species and Communities DB (TSC). For TSC I needed a local and performant copy of taxonomic data, which is not a replacement for WACENSUS. Safe to ignore any "WACENSUS"y noise coming from WAStD/TSC!

"paraphyletic_groups" as defined here https://github.com/dbca-wa/wastd/blob/master/taxonomy/models.py#L1701 are a direct copy of WACENSUS's "HBV Supra Groups", "A possibly paraphyletic supragroup for convenient subsetting". In the front-end we call them plainly "groups" and the use cases are driven by and limited to the given (nicely self-explanatory) groups. Thanks still for highlighting a possible pitfall - what immediate risks can you see?

— You are receiving this because you commented.

Reply to this email directly, view it on GitHub https://github.com/dbca-wa/wastd/issues/288#issuecomment-640386353, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACIBGYHKBGAR2T5FBB5KI33RVR4SDANCNFSM4NX64QGQ .

--

Alex Chapman Consulting Scientist Gaia Resources — Environmental Technology Consultants m +61 410 384 330 p +61 8 9227 7309 w https://www.gaiaresources.com.au/ e alex.chapman@gaiaresources.com.au

Find us at Level 6 FLUX, 191 St Georges Tce, Perth, WA https://www.gaiaresources.com.au/contact-us/

Read about the range of work we undertake on our blog https://www.gaiaresources.com.au/blog/

Interested in attending our enviro QGIS training http://www.gaiaresources.com.au/announcing-environmental-qgis-training/ course?

florianm commented 4 years ago

Ok, that makes sense. I'll update the documentation to make clear that although the data model allows and contains true paraphyletic group memberships, not all SupraGroups are necessarily paraphyletic.

alexrchapman commented 4 years ago

Ok, that’s a start - too late to change all the code I’m guessing?

A.

On Mon, 8 Jun 2020 at 4:23 pm, Florian Mayer notifications@github.com wrote:

Ok, that makes sense. I'll update the documentation to make clear that although the data model allows and contains true paraphyletic group memberships, not all SupraGroups are necessarily paraphyletic.

— You are receiving this because you commented.

Reply to this email directly, view it on GitHub https://github.com/dbca-wa/wastd/issues/288#issuecomment-640448395, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACIBGYHKYSMHUX4P3I5GEV3RVSNWTANCNFSM4NX64QGQ .

--

Alex Chapman Consulting Scientist Gaia Resources — Environmental Technology Consultants m +61 410 384 330 p +61 8 9227 7309 w https://www.gaiaresources.com.au/ e alex.chapman@gaiaresources.com.au

Find us at Level 6 FLUX, 191 St Georges Tce, Perth, WA https://www.gaiaresources.com.au/contact-us/

Read about the range of work we undertake on our blog https://www.gaiaresources.com.au/blog/

Interested in attending our enviro QGIS training http://www.gaiaresources.com.au/announcing-environmental-qgis-training/ course?

florianm commented 4 years ago

Whenever WACENSUS will be replaced (have no visibility of DBCA's strategy on this), TSC and all other in-house solutions consuming taxonomy will likely switch over to use that hypothetical successor as point of truth for taxonomy, and with that all of the above will be blown away.

TLDR: WAStD's taxonomy will likely be superseded before it becomes a problem. But it's good to flag any potential issues early and clearly - thanks for taking the time and effort, much appreciated!

Right now I've got no bandwidth other to clarify in descriptions and documentation.

alexrchapman commented 4 years ago

Agreed - there’s some notional budget in the BIO schema for a full rebuild of WACENSUS - let’s hope that bears fruit!

All the best to you, Alex

On Mon, 8 Jun 2020 at 4:54 pm, Florian Mayer notifications@github.com wrote:

Whenever WACENSUS will be replaced (have no visibility of DBCA's strategy on this), TSC and all other in-house solutions consuming taxonomy will likely switch over to use that hypothetical successor as point of truth for taxonomy, and with that all of the above will be blown away.

TLDR: WAStD's taxonomy will likely be superseded before it becomes a problem. But it's good to flag any potential issues early and clearly - thanks for taking the time and effort, much appreciated!

Right now I've got no bandwidth other to clarify in descriptions and documentation.

— You are receiving this because you commented.

Reply to this email directly, view it on GitHub https://github.com/dbca-wa/wastd/issues/288#issuecomment-640466456, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACIBGYCYOMYWQTXCI6LTXNDRVSRNDANCNFSM4NX64QGQ .

--

Alex Chapman Consulting Scientist Gaia Resources — Environmental Technology Consultants m +61 410 384 330 p +61 8 9227 7309 w https://www.gaiaresources.com.au/ e alex.chapman@gaiaresources.com.au

Find us at Level 6 FLUX, 191 St Georges Tce, Perth, WA https://www.gaiaresources.com.au/contact-us/

Read about the range of work we undertake on our blog https://www.gaiaresources.com.au/blog/

Interested in attending our enviro QGIS training http://www.gaiaresources.com.au/announcing-environmental-qgis-training/ course?