afsc-gap-products / general

0 stars 0 forks source link

Planning to standardize taxonomic codes #3

Open Lewis-Barnett-NOAA opened 2 years ago

Lewis-Barnett-NOAA commented 2 years ago

I wanted to put a placeholder here for considering how to streamline and augment our approach to coding taxonomy in RACEBASE.

As part of a NMFS/DFO working group @EmilyMarkowitz-NOAA and I are participating in, it was identified that using data across NMFS and DFO regions (or those of other entities, such as ADFG) would require a common/unified system for classifying taxa (and updated names and taxonomies as they change over time). This would also be helpful for analyses that aim to aggregate data by a taxonomic level above that of the species.

While this would require a minor amount of manual checking once per year, it could reduce the amount of time we spend curating the taxonomic data by connecting our records to a widely-accepted taxonomic database. This would also be helpful for analyses that aim to aggregate data by a taxonomic level above that of the species. Most entities are gravitating toward ITIS https://www.itis.gov/ for this purpose, and Emily has made great headway in matching our species definitions to those in ITIS (in addition to WoRMS). However, cases where matches aren't clear should probably be checked by those most familiar with the classification of the organisms we encounter...which would take some time up front but hopefully require little annual updating.

I defer to our taxonomists of course on if/how/when to approach this, but it would be great to have a conversation about it. @Duane-Stevenson-NOAA @SarahFriedman-NOAA @ThaddaeusBuser-NOAA do you have any initial thoughts? Thanks!

Duane-Stevenson-NOAA commented 2 years ago

This idea has come up a few times in the past, but Jay and I always resisted based on the thought that maintaining such an equivalency table could potentially be very time-consuming, particularly for invertebrates. We did (I really mean Jay, Nancy, and Heather did) produce a classification hierarchy table (RACEBASE.SPECIES_CLASSIFICATION) that includes classification of taxa from species to kingdom level. However, this table did not cross-reference ITIS or WoRMS codes, and is probably pretty outdated by now.

The production of a simple equivalency table (RACE species code = ITIS code) is probably not terribly difficult, and the maintenance of such a table might not be too bad (though not trivial). However, if the ambition is to establish heirarchical groupings and keep them parallel with ITIS, or some other external database, that will be a pit of despair for someone. Non-taxonomists are always shocked and dismayed by how often theories about higher level classification are revised and altered.

On Thu, Dec 1, 2022 at 1:20 PM Lewis-Barnett-NOAA @.***> wrote:

I wanted to put a placeholder here for considering how to streamline and augment our approach to coding taxonomy in RACEBASE.

As part of a NMFS/DFO working group @EmilyMarkowitz-NOAA https://github.com/EmilyMarkowitz-NOAA and I are participating in, it was identified that using data across NMFS and DFO regions (or those of other entities, such as ADFG) would require a common/unified system for classifying taxa (and updated names and taxonomies as they change over time). This would also be helpful for analyses that aim to aggregate data by a taxonomic level above that of the species.

While this would require a minor amount of manual checking once per year, it could reduce the amount of time we spend curating the taxonomic data by connecting our records to a widely-accepted taxonomic database. This would also be helpful for analyses that aim to aggregate data by a taxonomic level above that of the species. Most entities are gravitating toward ITIS https://www.itis.gov/ http://url for this purpose, and Emily has made great headway in matching our species definitions to those in ITIS (in addition to WoRMS). However, cases where matches aren't clear should probably be checked by those most familiar with the classification of the organisms we encounter...which would take some time up front but hopefully require little annual updating.

I defer to our taxonomists of course on if/how/when to approach this, but it would be great to have a conversation about it. @Duane-Stevenson-NOAA https://github.com/Duane-Stevenson-NOAA @SarahFriedman-NOAA https://github.com/SarahFriedman-NOAA @ThaddaeusBuser-NOAA https://github.com/ThaddaeusBuser-NOAA do you have any initial thoughts? Thanks!

— Reply to this email directly, view it on GitHub https://github.com/afsc-gap-products/general/issues/3, or unsubscribe https://github.com/notifications/unsubscribe-auth/ANKDWAX7K7JQQXKNCDHEZPDWLEJAZANCNFSM6AAAAAASRHJHWA . You are receiving this because you were mentioned.Message ID: @.***>

-- Duane Stevenson, Ph.D. Supervisory Fish Biologist Groundfish Assessment Program NMFS, Alaska Fisheries Science Center

EmilyMarkowitz-NOAA commented 2 years ago

I also defer to what you all think, but below is some additional food for thought. tldr; I think these ITIS codes will become increasingly valuable to include and that this is a timely discussion as our data workflow and computation working groups start auditing our workflows. I'll also add @Ned-Laman-NOAA for GOA/AI input and @zoyafuso-NOAA as the computations working group lead.

Some of the major benefits to using ITIS codes include:

I think this would be relatively easy to implement. I think checking the species on an annual (?) basis could be guided by a script that would make it easy to check taxon and would replace, and be much more efficient than collating and researching our own bespoke species classification datafiles. After the initial lift, I would anticipate that this annual review may take a day or so.

I've had experience working with ITIS codes in {taxize} on other projects and, as part of an exercise for the FOSS data, already whipped up a script that would help us connect AFSC species codes to ITIS (and WoRMS) codes. As the code stands, I was able to match a most of the codes by just letting the code work its magic. The remaining taxon that need review is now a pretty small and workable list, but should be checked by taxonomists experts. The outputs of that AFSC species code to ITIS (and WoRMS) translation script/exercise are available in the RACEBASE_FOSS.AFSC_ITIS_WORMS oracle table if you want to have a peak of what that would look like.

The NEFSC, for example, (as I understand it) transitioned a few years ago to using ITIS codes. Like with our species codes, their species codes also included market value level IDs. Now they use ITIS codes and an additional "market" code column, that allowed them to identify organisms by ITIS species code and then by other market-/reference-relevant levels (e.g., juveniles, adult, eggs).

I think adding ITIS codes to species codes could be useful on several additional fronts, as well.

Ned-Laman-NOAA commented 2 years ago

We already developed a Taxonomics system in Oracle that leverages ITIS codes in RACE_DATA and for RACEBase. Check with @NancyRoberson about its implementation.

EmilyMarkowitz-NOAA commented 2 years ago

I think @NancyRoberson shared that to me a while ago but it wasn't working/hadn't made many of the matches. But I may be misremembering.

Lewis-Barnett-NOAA commented 2 years ago

Thanks for the info, and Em for providing more context. I'd be curious to hear about what may be in the works via Nancy.

I agree with you Duane that it wouldn't be worth trying to maintain a full taxonomic hierarchy independently and match that to ITIS. I was actually thinking that one big benefit of making an "equivalency table" between AFSC species codes and ITIS codes would be that we could then easily use the ITIS database to obtain the correct hierarchical classification at any given time.

On Thu, Dec 1, 2022 at 4:05 PM Em Markowitz (NOAA) @.***> wrote:

I think @NancyRoberson https://github.com/NancyRoberson shared that to me a while ago but it wasn't working/hadn't made many of the matches. But I may be misremembering.

— Reply to this email directly, view it on GitHub https://github.com/afsc-gap-products/general/issues/3#issuecomment-1334598815, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAKMJP7LPJOMC2OWWBKVEFLWLE4KPANCNFSM6AAAAAASRHJHWA . You are receiving this because you were mentioned.Message ID: @.***>

-- Lewis Barnett, PhD (he/him/his) Research Fish Biologist

NOAA Fisheries, Alaska Fisheries Science Center 7600 Sand Point Way NE, Bldg 4 Seattle, Washington 98115 Google Voice: (206) 526-4111

NancyRoberson commented 2 years ago

Thanks all. When we tried this, it was a time consuming endeavor with not much return for the trouble. But that was years ago and I bet with a dedicated programmer, (thanks Em!), the maintenance process will probably be more streamlined and efficient.

Some of the issues we ran up against then and may not be an issue now: ITIS was slow to incorporate new species or changes in taxonomy (like they could be a year or so behind). We usually used ITIS for fish and WoRMS but sometimes a species wasn't in either. There were frequently differences between ITIS and WoRMS so there was back and forth with Systematics on which taxonomies to use. Not many staff members used the tables or were interested in finding out about them so the tables languished. The recursive queries were challenging to construct.

I hope this new effort works out and am excited about it. Let me know how I can be helpful.

Ned-Laman-NOAA commented 2 years ago

Two cents more: I have one or two recursive queries against the RACE_DATA "TAXONOMICS" tables that I can share but Heather would be the authority on connecting the dots between TAXONOMICS and RACEBase.SPECIES.

SarahFriedman-NOAA commented 1 year ago

I have implemented some code and come up with a tentative table of taxonomic updates that need to be made in our current database here. It primarily relies on WORMS currently because I found the ITIS databases wildly out of date in many cases, but the code is flexible enough to search whichever database the user specifies. My hope is that we will be able to run this code at the beginning of every year and make any necessary updates on an annual basis to both the taxonomy table and the species code list.

We (Duane, Thaddaeus, and I) are currently trying to figure out how to implement these changes moving forward. These changes will probably have to happen concurrently with guidebook updates and that is a massive undertaking for me and Thaddaeus that may not be feasible this year. Furthermore, we discussed potentially adding another table to racebase that links invalid synonyms with currently valid names (particularly for species with separate racebase codes that have now been synonymized), so that historical records remain untouched but accessible by the currently accepted species name. We are still figuring out the specifics there/the best way to go about this, but very much open to feedback.

EmilyMarkowitz-NOAA commented 1 year ago

I am so stoked about this! I can think of a bunch of projects that will benefit from your hard work on this when you push the final table! Looking forward to chatting later this week about it. Keep me posted if I can be of any other service!

Ned-Laman-NOAA commented 1 year ago

+1 to the stoked-ness. This has been needed for so long! Sarah, you mention a new table in RACEBase. I recommend considering a solution from the RACE_DATA side as well (maybe a quick conversation with Heather) since it's my understanding that RACEBase species may be served from RACE_DATA now as well as other RACEBase data. At the least, Heather has spent some time thinking about these issues in the past and may have some additional insights.

SarahFriedman-NOAA commented 1 year ago

Yes, I am planning to discuss this with Heather prior to implementing anything. Also because I want to ensure all species name are accessible on the apps at sea, regardless of whether or not they are currently valid (to avoid confusion with names people may already be familiar with).