Bauble / bauble.classic

this is how Bauble and Ghini both started
GNU General Public License v2.0
10 stars 34 forks source link

batch taxonomic check information with tnrs #229

Closed mfrasca closed 8 years ago

mfrasca commented 8 years ago

splitting from #221 copying here the relevant comments there and removing them from that issue.

this issue is about correcting a whole batch of species entries, which might be missing authorship info, have typing mistakes in the species and even in the genus epithet.

what I here propose is a new still unnamed tool in the Tools menu.

2015-12-18 08:41-0500 I'm not so sure this what you mentioned is the same, or just similar to, the lookup functionality at http://tnrs.iplantcollaborative.org/TNRSapp.html this one is quite good, the only missing thing is the web-api. without a web-api, the user would need do a couple of manual steps, not so terrible according to me:

what is missing?

2015-12-18 11:55-0500 this might show what I mean: ca63eae the data contained simple mistakes like misplaced h, or missing i in double ii, things like that. with the help of TNRS, of the ~350 cases, we went down to 9.

2015-12-18 17:14-0500 the easiest I can imagine is:

2015-12-20 17:16-0500 closing the day, I'm having a look at what it might mean, to correct a whole batch of species. those having overall score 1, fine, this is useful for completing the data like authorship.

all with overall score less than 1 I would not dare send automatically to the program. somehow the user should have a look at them and check what they want to import and what not.

the following cases seem obvious: you need correct your data:

1   Adiantum capillus-Veneris   0.82    Adiantum capillus-veneris   species 1   L.  http://www.theplantlist.org/tpl1.1/record/tro-26602671;http://www.tropicos.org/Name/26602671;http://plants.usda.gov/java/profile?symbol=ADCA    L.  0.11            Pteridaceae Adiantum    1   capillus-veneris    1                                   Accepted    Adiantum capillus-veneris   L.  species http://www.theplantlist.org/tpl1.1/record/tro-26602671;http://www.tropicos.org/Name/26602671;http://plants.usda.gov/java/profile?symbol=ADCA    Adiantum capillus-veneris   Pteridaceae true    tpl;tropicos;usda       
8   Anthurium andranum  0.96    Anthurium andraeanum    species 0.96    Linden ex André    http://www.theplantlist.org/tpl1.1/record/kew-10633                 Araceae Anthurium   1   andraeanum  0.8                                 Accepted    Anthurium andraeanum    Linden ex André    species http://www.theplantlist.org/tpl1.1/record/kew-10633 Anthurium andraeanum    Araceae true    tpl     
15  Astrocaryua malybo  0.99    Astrocaryum malybo  species 0.99    H.Karst.    http://www.theplantlist.org/tpl1.1/record/kew-17547                 Arecaceae   Astrocaryum 0.91    malybo  1                                   Accepted    Astrocaryum malybo  H.Karst.    species http://www.theplantlist.org/tpl1.1/record/kew-17547 Astrocaryum malybo  Arecaceae   true    tpl     

but number 15 becomes less obvious considering there is already an Astrocaryum malybo in the database. so you need to edit things by hand, even for a close-to-perfect match of 0.99.

RoDuth commented 8 years ago

This, and the related issues, are looking really good @mfrasca ....

I definitely agree that you need to let the user somehow "tick off" any recommended corrections line by line as you need to allow the user to disagree...regardless of source and reliability... We can be a fickle bunch :smile:

And to gain all that authorship data that I am normally so slack about entering will be great! Such a help for those of us that struggle to get the time we would like to spend on our records. If I can see that all but 20 of the names in my database are accepted by tnrs then its only 20 names I need to look into... not 900+!

Haven't been on GitHub much lately (hope to get more time for it in the new year) so its great to see you have really run with this idea! :+1:

mfrasca commented 8 years ago

the "tick off", let me please rely on the user doing it after downloading the file and before stepping back to Bauble. otherwise it's a lot more programming. sooner or later maybe. not now.

mfrasca commented 8 years ago

(I did the tick off, and it was not the most complicated thing. this #239 is).

mfrasca commented 8 years ago

still missing: choose the file (you now have to type the path) connect the synonym/accepted lines...

when you accept something which is considered a synonym, you are accepting both the synonym and the accepted names. this should show in the selection pane. when you decide you are NOT accepting the accepted name, you are also not accepting the synonym. but you can accept the accepted name and leave the synonym out.

mfrasca commented 8 years ago

oh, and the ok button should not be active until you reach the last page

mfrasca commented 8 years ago

in case of trouble, please reopen or file a new issue.