Bauble / bauble.classic

this is how Bauble and Ghini both started
GNU General Public License v2.0
10 stars 34 forks source link

on-the-fly import taxonomic information from theplantlist.org #221

Closed mfrasca closed 8 years ago

mfrasca commented 8 years ago

@RoDuth mentioned aspects of the above issues in #163.

RoDuth commented 8 years ago

I was debating that myself... Not sure as I haven't checked, but yes I most probably should have just opened the issue so thanks for doing it.

Sorry, for the next couple of weeks I am likely to be a bit short of time so I may seem like I'm dragging my feet a little or being sporadic..

mfrasca commented 8 years ago

some comments moved to #229, making sure this issue stays focused on the subject.

mfrasca commented 8 years ago

oh, but I'm still not sure this is what @RoDuth meant with his initial comment!

RoDuth commented 8 years ago

I was talking about making it easy to keep our records up to date at BGCI's PlantSearch as this is one of the easiest ways to make our collections available to researchers etc. (We already get lots of requests via this system and have worked with a couple of them) The institution data should already be in Bauble, maybe just need somewhere to save your login. I'm sure BGCI would be happy to know there was a database option out there that could do it in one simple step... And I imagine they would be happy to help make it happen. (may even get you a few more recommendations?) See instructions here: http://www.bgci.org/resources/plantsearchuploadinstructions/

I like where you are heading with TNRS though. Just a different issue. More along the lines of what I was referring to in #52 Would love to see this functionality.

mfrasca commented 8 years ago

oh, sorry, then I'm redirecting this issue #221 so its title matches its content, and reopen your issue with your own comments so it stays clean and clear. your relevant comment in issue #52:

Hi Mario, is this possibly similar to something I had mentioned in an email once and a comment that is similar from 2012 that I see on the Bauble google group:

"2) Is it possible to import all species from IPNI or even better fromTheplantlist? "

to which Brett replied:

"2. Bauble would understand the format of whatever IPNI or ThePlantList is stored in but you could probably write a script that could transform whatever format they provided into something Bauble could understand. Basically for Bauble to import it it has to be a CSV file where the columns in the CSV file match the columns of the table in the database that you want to import it into. If you want to work on it I'll be happy to give advice and pointers. "

if you remember my email I suggested periodic "auditing" of all plant names based of selected web resources, generating a list of name that need to be reviewed and then auto-correcting names, authors, synonym lists and the families they are attached to based of what selections are made. Maybe its possible to have auto filling as you update or type a new Taxon name also? Like when you search the Altas of living Australia web site. I imagine it would take some tricky programming to make work. Especially when you take into account cultivars, hybrids, that there are going to be plants that have not been identified yet (or just have a botanists tag name), aff., etc.. Then there are the regional differences which take time to make it to IPNI (or may never make it) so you need some way to select which names are selected against which sources or a preference system (one source always gets preference over another). For us we peg our names in this order:

  1. For plants from within our state (Queensland) we peg to the Queensland Herbarium (data set are available in CSV here: https://data.qld.gov.au/dataset/census-of-the-queensland-flora-2014) this way we get the latest names for our part of the world (the Genus Callistemon and Melaleuca being the big one at the moment. Queensland Herbarium no longer accept Callistemon, believing they are all species of Melaleuca. The rest of the world hasn't accepted this yet and we have many Callistemon/Melaleuca in our collection. - Just look for Melaleuca viminalis and see what you get in the various sources if you want to see what I'm talking about)
  2. For plants from elsewhere in Australia we peg to the Australian Plant Name Index (https://biodiversity.org.au/nsl/services/apni) but find the Atlas of Living Australia to be the easiest way to get to the most current name (found here http://www.ala.org.au/)
  3. For exotic plants (not from Australia) we peg to IPNI (http://www.ipni.org/index.html) in which case we would get it from The Plant List (http://www.theplantlist.org/)

*Our focus is on local plants so most of our collection comes from Queensland Herbarium names. If you do any work on this I should be able to help you with contacts for the Australian sources.

mfrasca commented 8 years ago

I could add it to the species editor, something coming up in the message area where you are used to see warnings (and click them away).

if I am to use theplantlist.org web service, in their csv results they mention the fact that an epithet is considered a synonym of something else, but you have to go online manually to get the full picture.

my mistaken reading: the last field in the csv response is the key to the accepted taxon, in case of synonymy (2015-12-22 18:20-0500)

so, dear watchers: if you are aware of more sites offering their taxonomic information through a web service, please mention it here!

mfrasca commented 8 years ago

@tmyersdn, I have seen your #214 and still need find the time to have to look at what it does and how.

RoDuth commented 8 years ago

@mfrasca maybe not ideal for everyone but, for myself and a lot of us in Australia (and potentially NZ in the future I believe), the Atlas of Living Australia is the source I refer to the most (say 80-90% of our collection). They seem to have some data exports and the LSID etc. (see here for an example, right hand side of screen: http://bie.ala.org.au/species/Brachychiton+bidwillii) and do have synonymy. Also, you can download the results of a search as a csv.

I sent you an email a while back with a chain of discussion with ALA developers and others where it was mention that they were interested in assisting BGs with naming. They could be worth contacting re: web api etc.?

The other source we use a fair bit is the APC but as they are linked I tend to go to the ALA more often. I notice that APC does have a bulk name check service also. Have never tried it.

RoDuth commented 8 years ago

Also, @mfrasca. Love the ideas above for the species editor.

Could the LSID help at all?

mfrasca commented 8 years ago

@RoDuth , no, not ideal, the http://biodiversity.org.au/taxon/ site. it does offer quite complete information, but we really need something that covers us worldwide. I tried just at random 4 species from the Brako & Zarucchi list (Catalogue ... of Peru) and it was four 404.

I don't understand the LSID. what is it for?

mfrasca commented 8 years ago

http://www.theplantlist.org/tpl1.1/search?q=%(sp.genus.epithet)s %(sp.epithet)s&csv=true when a row is marked as Synonym, its last field is the accepted ID, which we can retrieve similarly

for example, I look for Cyrtidium stumpflei:

kew-54471,A,Orchidaceae,,Cyrtidium,,"stumpflei",,"","Garay",Synonym,,H,WCSP,54471,75438-2,"Orquideologia","4: 6","","1969",kew-54465

I can do a second query for kew-54465 and get the complete picture:

kew-54465,A,Orchidaceae,,Cyrtidiorchis,,"stumpflei",,"","(Garay) Rauschert",Accepted,,H,WCSP,54465,910482-1,"Taxon","31: 560","","1982",
RoDuth commented 8 years ago

My understanding is that the LSID is a direct link to the data... regardless of the name. Kind of like an accession number for the taxon, (although it can be used in other ways). It also contains the resource that it came from. e.g. urn:lsid:biodiversity.org.au:apni.taxon:754049 points to Brachychiton bidwillii as defined by APNI (at biodiversity.org.au) and urn:lsid:ipni.org:names:530017-1 points to Aloe vera as defined by IPNI (at ipni.org). Kind of like your kew-54465 above.

@tmyersdn is the one who first suggested that it would be handy to keep in the database and may be able to exand on this??

I put APC and ALA forward because I figured that if it was easy to work with and you have contacts for the developers etc. to ask any questions that it may be a good test case?

mfrasca commented 8 years ago

ah! this is what I missed: it would be handy to keep [the LSID] in the database. :relieved: ok. sounds like a new issue. feel free to open! :beetle:

mfrasca commented 8 years ago

string distances will be relevant. see for example this question

mfrasca commented 8 years ago

the logic is all in place and you can see what it does in the logging window, if you open a terminal and tail -f ~/.bauble/bauble.log. now I have to write the part showing the information in self.view.add_message_box and handling the response.

mfrasca commented 8 years ago

still to do: it does not report when a query ends (and your data is fine), nor when it does not get any reply from the server, nor does it check discrepancies in authorship, and I'm not sure what to do with omonyms from different authors. screenshot from 2015-12-23 12 04 05

mfrasca commented 8 years ago

oh, I'm not so sure how it should behave with infraspecific information. I have Bauhinia splendens, which is a synonym of Bauhinia guianensis var. splendens (Kunth) Amshoff. the code I've implemented ignores everything which is not at rank species. so it will find the information for Bauhinia splendens, see it's marked as synonym, look for the accepted name, discard all varieties and formae, see that nothing matches, and conclude "no match found". I think I prefer leaving this for when I finally find the energies to unit-test this presenter.

mfrasca commented 8 years ago

I check »Annona squamosa L.«, the program finds several omonyms, and chooses randomly among them, so it can return »Annona squamosa Delile«, offer correction but also offer to add »Annona squamosa L.« as the accepted taxon. expected result is "fine match, well done".

mfrasca commented 8 years ago

what happens here with Brownea ariza? screenshot from 2015-12-23 17 09 17

mfrasca commented 8 years ago

the above two problems are solved in master and will be fine in 1.0.54.

mfrasca commented 8 years ago

forgot to close