ArctosDB / arctos

Arctos is a museum collections management system
https://arctos.database.museum
60 stars 13 forks source link

Feature Request - Tool to check for names in that are in WoRMS but aren't in 'WoRMS via Arctos' source #7932

Open genevieve-anderegg opened 3 months ago

genevieve-anderegg commented 3 months ago

Is your feature request related to a problem? Please describe. We can refresh names within a taxonomic group with the "WoRMS request refresh tool" (https://arctos.database.museum/tools/requestWormsRefresh.cfm, but that tool does not pick up new names within a group. It would be very helpful to have a tool that would compare what is in WoRMS via Arctos vs what is in WoRMS, so you can see if there are any new names in a family that haven't been entered into Arctos yet. The tool could just give you a csv to upload (name, aphia ID), and then you could upload the names to add all the genera and species within a family. Right now the refresh tool is great but does not pick up new names.

Describe what you're trying to accomplish Easily discover which species names from a family are in WoRMS, but not yet in WoRMS via arctos. Then be able to bulkload those names

Describe the solution you'd like Another tool, similar to the refresh.

Describe alternatives you've considered Manually search and compare back and forth between the websites, and then build own csv/individually upload names.

Additional context Discussed in Taxonomy Committee meeting yesterday. WoRMS has added lots of scientific names for land snails that aren't already in Arctos, and we end up manually adding names a lot when volunteers need them. It would be great to analyze all the missing names for a specific family and then be able to easily bulkload those. This would save us at DMNS:Inv a lot of time!

Priority Please assign a priority-label. Unprioritized issues gets sent into a black hole of despair.

sharpphyl commented 3 months ago

This would be hugely helpful for current Arctos collections, and it would also make Arctos more attractive for any new Arctos members that will rely on WoRMS (via Arctos) taxonomy. Right now, WoRMS (via Arctos) is falling out of date. Finding a more efficient way to add new taxa would enable us to ensure prospective collections that WoRMS (via Arctos) is continuously updated.

Some families have more than the WoRMS download limit of 1000 records. For example, Helicidae

Screenshot 2024-07-12 at 4 29 07 PM

@dustymc do you have a way to get all these names through a larger download than we can do?

dustymc commented 3 months ago

I don't have a solution for this, assuming I understand it (I think the request is to add names). There are a LOT of things in WoRMS that are not taxa and should not be names in Arctos, even if we can find things that aren't in Arctos I don't think machines/scripts would be able to usefully sort them. Possibly GlobalNames would be interested and capable of helping?

genevieve-anderegg commented 3 months ago

Would there be a way to ingest certain groups within WoRMS that aren't within "WoRMS via Arctos", similar to when "WoRMS via Arctos" was just first created? Like you say, they have a lot of non-mollusk taxa DMNS:Inv does not need added. But they do have lots of more recent land snail taxa that we would like within "WoRMS via Arctos" Or is there another workflow you might recommend for adding a bunch of names from WoRMS into "WoRMS via Arctos" at once?

genevieve-anderegg commented 3 months ago

Or is there another workflow you might recommend for adding a bunch of names from WoRMS into "WoRMS via Arctos" at once?

If I requested a csv from the good folks at WoRMS of scientific names and AphiaIDs for everthing within a group ("Phylum Mollusca"), could I use the Taxon Name Checker and then bulkload the ones we don't already have? (Never done this before pardon my ignorance)

dustymc commented 3 months ago

could I use the Taxon Name Checker

Probably, and if not I can certainly help with shuffling big blobs of data around.

There's also a taxon dump at https://arctos.database.museum/cache/gn_merge.tgz if you want to use open refine or whatever.

and then bulkload

Some significant percentage of those will almost certainly be things other than taxon names, so I doubt just loading them will be that simple. The complicated part (in my somewhat limited experience) is figuring out what is and isn't a name at this point. Maybe GN can help, I'm certainly happy to help (but all I can do is check string patterns, I don't know anything about mollusc taxonomy!), but I really think this will need some familiar-to-expert eyes (someone who would recognize name-like not-names) on it before it becomes names in Arctos.

sharpphyl commented 3 months ago

Just a FYI. We had an extensive discussion about this in 2022 on https://github.com/ArctosDB/arctos/issues/4784.

I closed it when it appeared there was nothing that could be done to simplify the process. What we need is a csv of all WoRMS names missing in Arctos (via Arctos) and their aphiaIDs for a specific taxonomic group. Any help or ideas to simplify the process would be greatly appreciated. Depending on what families other museums work with, this request may resurface from other Arctos users in the future.

mkoo commented 3 months ago

What's needed is a smart taxonomy comparison tool. Can the Taxonomy committee maybe survey tools in other taxonomy systems? Are there OpenRefine workflows to adapt? Are there R scripts and packages to apply? What about other tools? WoRMS can be the use-case here and obviously this topic could benefit from other taxonomists across disciplines.

Thanks for adding the older discussion- it seems that there was also an issue about accessing WoRMS data reliably. Is that part of the problem here?

It seems to me importing desired new taxonomies is not an issue since that can be done via programmatic request but figuring out what new taxa needs to be is painful.

genevieve-anderegg commented 3 months ago

What's needed is a smart taxonomy comparison tool. Can the Taxonomy committee maybe survey tools in other taxonomy systems? Are there OpenRefine workflows to adapt? Are there R scripts and packages to apply? What about other tools? WoRMS can be the use-case here and obviously this topic could benefit from other taxonomists across disciplines.

I can put this on the list for the next Taxonomy committee. Unfortunately I am not that familiar with R, or other complex analysis tools. Does anyone have any good ideas? I can always play around in OpenRefine

It seems to me importing desired new taxonomies is not an issue since that can be done via programmatic request but figuring out what new taxa needs to be is painful.

Yeah that's what it seems to boil down to; in #4784 that was Dusty's same concern. It's kind of a catch 22 in some way, because we don't know what names we will need specifically until we realize we need to add them (right when a volunteer encounters it during data entry), but I also understand that adding a ton of names that might not get used could cause problems (slowing down arctos? etc.)

I'll go ahead and email the WoRMS people and ask for a csv for names, classifications, and aphiaIDs for taxa in the larger (over 1000 children) Molluscan family Tellinidae, which is a family we anticipate working a lot with soon. That way I can start looking at the data we would like to add and go through the bulkload process and see if there's anything I can think of to improve that.