Bauble / bauble.classic

this is how Bauble and Ghini both started
GNU General Public License v2.0
10 stars 34 forks source link

update family-genus database #52

Closed mfrasca closed 8 years ago

mfrasca commented 9 years ago

From @mfrasca on March 10, 2015 15:41

Saskia @smbantjes writes: »Can you correct family names that have changed over time. Like Drimiopsis maculata is named in Bauble Hyacinthaceae but should be Asparagaceae. Sometimes after research family names are changed. «

this is a known issue and it scared me off so much that I never considered even reporting it!

Copied from original issue: mfrasca/bauble.classic#38

mfrasca commented 9 years ago
>>> r = requests.post("http://tropicos.org/NameMatching.aspx", data={
... "__EVENTTARGET": "", 
... "__EVENTARGUMENT": "", 
... "ctl00$MainContentPlaceHolder$ctl01": "Match Names"}, 
... files={"ctl00$MainContentPlaceHolder$fileUploadControl": "FullNameNoAuthors\nGongora beyrodtiana"})
>>> header, row = [i.split('\t') for i in r.text.strip().split("\n")]
>>> d = dict(zip(header, row))
>>> print d
mfrasca commented 9 years ago

wrote again to tropicos requesting a web api to get read access to their info. also interesting: https://en.wikipedia.org/wiki/World_Checklist_of_Selected_Plant_Families

RoDuth commented 9 years ago

Hi Mario, is this possibly similar to something I had mentioned in an email once and a comment that is similar from 2012 that I see on the Bauble google group: "2) Is it possible to import all species from IPNI or even better from
Theplantlist? " to which Brett replied: "2. Bauble would understand the format of whatever IPNI or ThePlantList is stored in but you could probably write a script that could transform whatever format they provided into something Bauble could understand. Basically for Bauble to import it it has to be a CSV file where the columns in the CSV file match the columns of the table in the database that you want to import it into. If you want to work on it I'll be happy to give advice and pointers. "

if you remember my email I suggested periodic "auditing" of all plant names based of selected web resources, generating a list of name that need to be reviewed and then auto-correcting names, authors, synonym lists and the families they are attached to based of what selections are made. Maybe its possible to have auto filling as you update or type a new Taxon name also? Like when you search the Altas of living Australia web site. I imagine it would take some tricky programming to make work. Especially when you take into account cultivars, hybrids, that there are going to be plants that have not been identified yet (or just have a botanists tag name), aff., etc.. Then there are the regional differences which take time to make it to IPNI (or may never make it) so you need some way to select which names are selected against which sources or a preference system (one source always gets preference over another). For us we peg our names in this order: 1)For plants from within our state (Queensland) we peg to the Queensland Herbarium (data set are available in CSV here: https://data.qld.gov.au/dataset/census-of-the-queensland-flora-2014) this way we get the latest names for our part of the world (the Genus Callistemon and Melaleuca being the big one at the moment. Queensland Herbarium no longer accept Callistemon, believing they are all species of Melaleuca. The rest of the world hasn't accepted this yet and we have many Callistemon/Melaleuca in our collection. - Just look for Melaleuca viminalis and see what you get in the various sources if you want to see what I'm talking about) 2)For plants from elsewhere in Australia we peg to the Australian Plant Name Index (https://biodiversity.org.au/nsl/services/apni) but find the Atlas of Living Australia to be the easiest way to get to the most current name (found here http://www.ala.org.au/) 3)For exotic plants (not from Australia) we peg to IPNI (http://www.ipni.org/index.html) in which case we would get it from The Plant List (http://www.theplantlist.org/) *Our focus is on local plants so most of our collection comes from Queensland Herbarium names. If you do any work on this I should be able to help you with contacts for the Australian sources.

mfrasca commented 9 years ago

I would need a web api to automate all this. I cannot cope with so many different sources, to be checked by hand, all on my own! but points taken and thanks for the references. while working at the JBQ I have been briefly in contact with people at Tropicos (MO), I got a very good export of local (to Ecuador) plants, but they have not been able to open access to their data for automatic queries.

brettatoms commented 9 years ago

This a pretty difficult problem. For every source you would need some kind of adapter to translate the original source to the Bauble data and also maintain a unique id from the original source so you could later pull in any updates. Any time you have to synchronize databases there tends to be a ton of corner cases like what happens when a user manually adds some taxonomic rank and then you have to associate that with something in the remote database or you end up with duplicates, etc.

Originally the database of families and genera were just meant to provide a decent base to start with and it was up to the user to keep their taxonomy up to date. Having some remote source dynamically provide the canonical list of names would be cool but it's not an easy task to get right.

My 2 cents.

mfrasca commented 9 years ago

@brettatoms , totally agree it's a difficult problem and I prefer to split it in two parts.

one is making sure the initial data used for initializing the family-genera information is updated. this is the one I was considering when I opened this issue.

the other is offering some support to the user in order to update their database in case things change, since things change continuously. this one looks to me as you say "pretty difficult".

brettatoms commented 9 years ago

@mfrasca Sorry, I wasn't specific. I was more responding to @RoDuth about the remote source and syncing. I hadn't actually read the previous comments ;)

mfrasca commented 9 years ago

:+1: thanks for you comments. they are always informed, even when written "without reading the previous comments". :smile:

RoDuth commented 9 years ago

I appreciate that this is no easy task but when the Botanical Gardens Informatics Working Group was established in Australia one of the first goals that was set for the Botanical Gardens Hub was that it should help us smaller botanic gardens with the checking of taxon names. Must be a way to make use of this? @mfrasca I have forwarded you an email that I hope explains. @brettatoms Glad to see you still keeping an eye on us! Appreciate the feed back.

mfrasca commented 8 years ago

there are a couple of items in the current genus.txt file which look curious to me:

5710,"Bigelowia","DC.",118:118,"Asteraceae"
18574,"Bigelowia","DC.",414:414,"Rubiaceae"
1110,"Calophyllum","L.",221:221,"Clusiaceae"
22996,"Calophyllum","L.",345:345,"Orchidaceae"
16181,"Disperma","J.F.Gmel.",1:1,"Acanthaceae"
18543,"Disperma","J.F.Gmel.",414:414,"Rubiaceae"
8901,"Endopogon","Nees",1:1,"Acanthaceae"
18569,"Endopogon","Raf.",414:414,"Rubiaceae"
19425,"Endopogon","Nees",433:433,"Scrophulariaceae"
18670,"Fitzgeraldia","F.Muell.",29:29,"Annonaceae"
22576,"Fitzgeraldia","F.Muell.",345:345,"Orchidaceae"
20939,"Fremontia","Torr.",105:105,"Chenopodiaceae"
17942,"Fremontia","Torr.",449:449,"Sterculiaceae"
8320,"Gerardia","Benth.",1:1,"Acanthaceae"
19443,"Gerardia","Benth.",433:433,"Scrophulariaceae"
22701,"Hypodematium","A.Rich.",345:345,"Orchidaceae"
18579,"Hypodematium","A.Rich.",414:414,"Rubiaceae"
13594,"Hypodematium","Kunze",504:504,"Woodsiaceae"
24797,"Kentia","Blume",29:29,"Annonaceae"
19119,"Kentia","Blume",349:349,"Arecaceae"
23543,"Niemeyera","F.Muell.",345:345,"Orchidaceae"
7223,"Niemeyera","F.Muell.",424:424,"Sapotaceae"
23237,"Phacellanthus","Siebold & Zucc.",143:143,"Cyperaceae"
8483,"Phacellanthus","Siebold & Zucc.",433:433,"Scrophulariaceae"
21622,"Pterilis","Raf.",397:397,"Pteridaceae"
21798,"Pterilis","Raf.",504:504,"Woodsiaceae"
19389,"Slackia","Griff.",203:203,"Gesneriaceae"
19126,"Slackia","Griff.",349:349,"Arecaceae"
2074,"Stachyanthus","Engl.",251:251,"Icacinaceae"
23046,"Stachyanthus","Engl.",345:345,"Orchidaceae"
RoDuth commented 8 years ago

Are they all just homonyms? Is there no more than one accepted name in each case, with the others as synonyms of something? Authorship is a little odd but I just put Bigelowia into an IPNI search with "DC." as the author and it did return records for Asteraceae and Rubiaceae, (just a few years apart) so it looks OK to me. Did the same for Hypodematium and got results that correlate to the above... I suspect these may be examples of botanists "recycling" their favourite names until they finally do get them accepted. Was this what concerned you @mfrasca?

mfrasca commented 8 years ago

if the data is correct, then I need solve an error in the software, because Bigelowia and Stachyanthus cause trouble when I try to retrieve the object based on its fields. to solve the issue without correcting the code I had to remove both of them.

on the other hand, using IPNI as source:

so I think I prefer correcting these two and leave the code as it is.

mfrasca commented 8 years ago

downloading the data from ars-grin did not solve the original report by @smbantjes : according to ars-grin, Drimiopsis is a genus in Hyacinthaceae but the-plant-list says Asparagaceae. not that it makes such a huge difference by now, because Hyacinthaceae is now marked as a synonym of Asparagaceae.

mfrasca commented 8 years ago

I am not convinced of the correctness of all data from ars-grin, they have several loops like genus 1 considered synonym of genus 2 and also genus 2 considered synonym of genus 1. and also loops involving several different genera, as if they were totally equivalent. an example: Angophora, Corymbia and Eucalyptus

RoDuth commented 8 years ago

Just so you know... Angophora, Corymbia and Eucalyptus... species have jumped back and forth from these 3 genera over the years so they are potentially synonyms of each other, its at the species level that it makes more sense.