Closed mfrasca closed 8 years ago
>>> r = requests.post("http://tropicos.org/NameMatching.aspx", data={
... "__EVENTTARGET": "",
... "__EVENTARGUMENT": "",
... "ctl00$MainContentPlaceHolder$ctl01": "Match Names"},
... files={"ctl00$MainContentPlaceHolder$fileUploadControl": "FullNameNoAuthors\nGongora beyrodtiana"})
>>> header, row = [i.split('\t') for i in r.text.strip().split("\n")]
>>> d = dict(zip(header, row))
>>> print d
wrote again to tropicos requesting a web api to get read access to their info. also interesting: https://en.wikipedia.org/wiki/World_Checklist_of_Selected_Plant_Families
Hi Mario, is this possibly similar to something I had mentioned in an email once and a comment that is similar from 2012 that I see on the Bauble google group:
"2) Is it possible to import all species from IPNI or even better from
Theplantlist? "
to which Brett replied:
"2. Bauble would understand the format of whatever IPNI or ThePlantList
is stored in but you could probably write a script that could
transform whatever format they provided into something Bauble could
understand. Basically for Bauble to import it it has to be a CSV file
where the columns in the CSV file match the columns of the table in
the database that you want to import it into. If you want to work on
it I'll be happy to give advice and pointers. "
if you remember my email I suggested periodic "auditing" of all plant names based of selected web resources, generating a list of name that need to be reviewed and then auto-correcting names, authors, synonym lists and the families they are attached to based of what selections are made. Maybe its possible to have auto filling as you update or type a new Taxon name also? Like when you search the Altas of living Australia web site. I imagine it would take some tricky programming to make work. Especially when you take into account cultivars, hybrids, that there are going to be plants that have not been identified yet (or just have a botanists tag name), aff., etc.. Then there are the regional differences which take time to make it to IPNI (or may never make it) so you need some way to select which names are selected against which sources or a preference system (one source always gets preference over another). For us we peg our names in this order: 1)For plants from within our state (Queensland) we peg to the Queensland Herbarium (data set are available in CSV here: https://data.qld.gov.au/dataset/census-of-the-queensland-flora-2014) this way we get the latest names for our part of the world (the Genus Callistemon and Melaleuca being the big one at the moment. Queensland Herbarium no longer accept Callistemon, believing they are all species of Melaleuca. The rest of the world hasn't accepted this yet and we have many Callistemon/Melaleuca in our collection. - Just look for Melaleuca viminalis and see what you get in the various sources if you want to see what I'm talking about) 2)For plants from elsewhere in Australia we peg to the Australian Plant Name Index (https://biodiversity.org.au/nsl/services/apni) but find the Atlas of Living Australia to be the easiest way to get to the most current name (found here http://www.ala.org.au/) 3)For exotic plants (not from Australia) we peg to IPNI (http://www.ipni.org/index.html) in which case we would get it from The Plant List (http://www.theplantlist.org/) *Our focus is on local plants so most of our collection comes from Queensland Herbarium names. If you do any work on this I should be able to help you with contacts for the Australian sources.
I would need a web api to automate all this. I cannot cope with so many different sources, to be checked by hand, all on my own! but points taken and thanks for the references. while working at the JBQ I have been briefly in contact with people at Tropicos (MO), I got a very good export of local (to Ecuador) plants, but they have not been able to open access to their data for automatic queries.
This a pretty difficult problem. For every source you would need some kind of adapter to translate the original source to the Bauble data and also maintain a unique id from the original source so you could later pull in any updates. Any time you have to synchronize databases there tends to be a ton of corner cases like what happens when a user manually adds some taxonomic rank and then you have to associate that with something in the remote database or you end up with duplicates, etc.
Originally the database of families and genera were just meant to provide a decent base to start with and it was up to the user to keep their taxonomy up to date. Having some remote source dynamically provide the canonical list of names would be cool but it's not an easy task to get right.
My 2 cents.
@brettatoms , totally agree it's a difficult problem and I prefer to split it in two parts.
one is making sure the initial data used for initializing the family-genera information is updated. this is the one I was considering when I opened this issue.
the other is offering some support to the user in order to update their database in case things change, since things change continuously. this one looks to me as you say "pretty difficult".
@mfrasca Sorry, I wasn't specific. I was more responding to @RoDuth about the remote source and syncing. I hadn't actually read the previous comments ;)
:+1: thanks for you comments. they are always informed, even when written "without reading the previous comments". :smile:
I appreciate that this is no easy task but when the Botanical Gardens Informatics Working Group was established in Australia one of the first goals that was set for the Botanical Gardens Hub was that it should help us smaller botanic gardens with the checking of taxon names. Must be a way to make use of this? @mfrasca I have forwarded you an email that I hope explains. @brettatoms Glad to see you still keeping an eye on us! Appreciate the feed back.
there are a couple of items in the current genus.txt file which look curious to me:
5710,"Bigelowia","DC.",118:118,"Asteraceae"
18574,"Bigelowia","DC.",414:414,"Rubiaceae"
1110,"Calophyllum","L.",221:221,"Clusiaceae"
22996,"Calophyllum","L.",345:345,"Orchidaceae"
16181,"Disperma","J.F.Gmel.",1:1,"Acanthaceae"
18543,"Disperma","J.F.Gmel.",414:414,"Rubiaceae"
8901,"Endopogon","Nees",1:1,"Acanthaceae"
18569,"Endopogon","Raf.",414:414,"Rubiaceae"
19425,"Endopogon","Nees",433:433,"Scrophulariaceae"
18670,"Fitzgeraldia","F.Muell.",29:29,"Annonaceae"
22576,"Fitzgeraldia","F.Muell.",345:345,"Orchidaceae"
20939,"Fremontia","Torr.",105:105,"Chenopodiaceae"
17942,"Fremontia","Torr.",449:449,"Sterculiaceae"
8320,"Gerardia","Benth.",1:1,"Acanthaceae"
19443,"Gerardia","Benth.",433:433,"Scrophulariaceae"
22701,"Hypodematium","A.Rich.",345:345,"Orchidaceae"
18579,"Hypodematium","A.Rich.",414:414,"Rubiaceae"
13594,"Hypodematium","Kunze",504:504,"Woodsiaceae"
24797,"Kentia","Blume",29:29,"Annonaceae"
19119,"Kentia","Blume",349:349,"Arecaceae"
23543,"Niemeyera","F.Muell.",345:345,"Orchidaceae"
7223,"Niemeyera","F.Muell.",424:424,"Sapotaceae"
23237,"Phacellanthus","Siebold & Zucc.",143:143,"Cyperaceae"
8483,"Phacellanthus","Siebold & Zucc.",433:433,"Scrophulariaceae"
21622,"Pterilis","Raf.",397:397,"Pteridaceae"
21798,"Pterilis","Raf.",504:504,"Woodsiaceae"
19389,"Slackia","Griff.",203:203,"Gesneriaceae"
19126,"Slackia","Griff.",349:349,"Arecaceae"
2074,"Stachyanthus","Engl.",251:251,"Icacinaceae"
23046,"Stachyanthus","Engl.",345:345,"Orchidaceae"
Are they all just homonyms? Is there no more than one accepted name in each case, with the others as synonyms of something? Authorship is a little odd but I just put Bigelowia into an IPNI search with "DC." as the author and it did return records for Asteraceae and Rubiaceae, (just a few years apart) so it looks OK to me. Did the same for Hypodematium and got results that correlate to the above... I suspect these may be examples of botanists "recycling" their favourite names until they finally do get them accepted. Was this what concerned you @mfrasca?
if the data is correct, then I need solve an error in the software, because Bigelowia and Stachyanthus cause trouble when I try to retrieve the object based on its fields. to solve the issue without correcting the code I had to remove both of them.
on the other hand, using IPNI as source:
DC.
published Bigelowia
four times, twice look just the same to me putting it in Asteraceae
, and twice it's a nom. inval.
(Rubiaceae and Violaceae), both of which I would not want in the database.Stachyanthus Engl.
is not in Orchidaceae
.so I think I prefer correcting these two and leave the code as it is.
downloading the data from ars-grin did not solve the original report by @smbantjes : according to ars-grin, Drimiopsis is a genus in Hyacinthaceae but the-plant-list says Asparagaceae. not that it makes such a huge difference by now, because Hyacinthaceae is now marked as a synonym of Asparagaceae.
I am not convinced of the correctness of all data from ars-grin, they have several loops like genus 1 considered synonym of genus 2 and also genus 2 considered synonym of genus 1. and also loops involving several different genera, as if they were totally equivalent. an example: Angophora, Corymbia and Eucalyptus
Just so you know... Angophora, Corymbia and Eucalyptus... species have jumped back and forth from these 3 genera over the years so they are potentially synonyms of each other, its at the species level that it makes more sense.
From @mfrasca on March 10, 2015 15:41
Saskia @smbantjes writes: »Can you correct family names that have changed over time. Like Drimiopsis maculata is named in Bauble Hyacinthaceae but should be Asparagaceae. Sometimes after research family names are changed. «
this is a known issue and it scared me off so much that I never considered even reporting it!
Copied from original issue: mfrasca/bauble.classic#38