gilienv / EssOilDB

Restructuring of Essential Oil Database
Apache License 2.0
8 stars 6 forks source link

Disambiguating plant names and fixing typos #79

Open petermr opened 4 years ago

petermr commented 4 years ago

Plant names should be disambiguated at the binomial species level in the plant table. Thus

EssoilDB is not a taxonomy site so there is no need to record synonyms for data entry. (It may be useful to search for synonyms but this will be through a different mechanism.)

petermr commented 4 years ago

Treatment of VARIETIES and HYBRIDS

This is very important information but is relatively infrequent. We do not have a simple data model, so suggest:

EmanuelFaria commented 4 years ago

@petermr @Shruthi-M @gilienv I just stumbled upon this database of plant taxonomy. Maybe it's useful to you?

https://www.gbif.org/dataset/66dd0960-2d7d-46ee-a491-87b9adcfe7b1

Taxonomy tool: https://www.gbif.org/species/158596304

Seems like there's a link to download the entire data set here: (Not sure) https://www.gbif.org/dataset/66dd0960-2d7d-46ee-a491-87b9adcfe7b1#dataDescription

Description

GRIN taxonomic data provide the structure and nomenclature for accessions of the National Plant Germplasm System (NPGS), part of the National Genetic Resources Program (NGRP) of the United States Department of Agriculture’s (USDA’s) Agricultural Research Service (ARS). In GRIN Taxonomy for Plants all families and genera of vascular plants and over 46,000 species from throughout the world are represented, especially economic plants and their relatives. Information on scientific and common names, classification, distribution, references, and economic impacts are provided.

gilienv commented 4 years ago

Thank you Manny, we use GBIF for most of our Ecology work and its one of the most dependable species databases. I have reminded Shruthi to take into account information for species from this to build her Plant Table.

Shruthi:

Please create the plant table with following columns:

Binomial Species Name Synonyms Habit (i.e Overall shape == Grass/Vine/Tree/Shrub/) Genus Family Order Class Phylum Kingdom

Most importantly - We will need to connect this table with existing IDs in the main infopdata ( which we are now moving to restructuring as profile) table.

For example, if you remove a wrong plant name from the original dataset, what happens to all the data that was connected to this one in the Main Tables?! We cannot afford to delete that.

Please discuss this in the next Skype call.

petermr commented 4 years ago

Note that all properties are DERIVED from the Binomial name. They are there for searching or browsing, not to represent the original paper. They could be recomputed at any time.

On Wed, Jul 17, 2019 at 8:44 AM Gitanjali Yadav notifications@github.com wrote:

Thank you Manny, we use GBIF for most of our Ecology work and its one of the most dependable species databases. I have reminded Shruthi to take into account information for species from this to build her Plant Table.

Shruthi:

Please create the plant table with following columns:

Binomial Species Name Synonyms Habit (i.e Overall shape == Grass/Vine/Tree/Shrub/) Genus Family Order Class Phylum Kingdom

Most importantly - We will need to connect this table with existing IDs in the main infopdata ( which we are now moving to restructuring as profile) table.

For example, if you remove a wrong plant name from the original dataset, what happens to all the data that was connected to this one in the Main Tables?! We cannot afford to delete that.

Please discuss this in the next Skype call.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/gilienv/EssOilDB/issues/79?email_source=notifications&email_token=AAFTCSYD6KJ54CMH7DUFPIDP73EWNA5CNFSM4ICN4NVKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD2DKYQY#issuecomment-512142403, or mute the thread https://github.com/notifications/unsubscribe-auth/AAFTCS3OFW3LIVLXFMO5H5LP73EWNANCNFSM4ICN4NVA .

-- Peter Murray-Rust Founder ContentMine.org and Reader Emeritus in Molecular Informatics Dept. Of Chemistry, University of Cambridge, CB2 1EW, UK

EmanuelFaria commented 4 years ago

@Shruthi-M Hi Shruthi, would you please add me on Skype (Mannyrules) and Whatspp (+55 61 99675 3439) please?

Thanks! Manny

petermr commented 4 years ago

I believe that Taxize (TRNS) does NOT report synonymy. I entered Ocimum sanctum and Ocimum tenuiflorum and both reported they are Accepted.

Does EssoilDB V2.0 regard these as synonyms or distinct species?

This will drastically affect the numbers we report on the poster.

vinitamehlawat commented 4 years ago

Sir As of now, they are distinct species.

petermr commented 4 years ago

Thanks very much Vinita, I am putting together a poster which highlights disambiguation . I won't put in many details... but I'll use this as an example. Taxize did not appear to disambiguate. I think if we use GBIF it will - it does manually (if you put these in it will give "Accepted" for tenuiflorum and "Synonym" for sanctum. If there is an API that will solve it rapidly!! I am adding in Wikifactmine Dictionaries and this will change the poster. But you did a lot of work on dictionaries so it represents your work as well!

On Mon, Jul 22, 2019 at 4:10 PM vinitamehlawat notifications@github.com wrote:

Sir As of now, they are distinct species.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/gilienv/EssOilDB/issues/79?email_source=notifications&email_token=AAFTCS5RJYXBG5QMGEOOXP3QAXEURA5CNFSM4ICN4NVKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD2QHRAA#issuecomment-513833088, or mute the thread https://github.com/notifications/unsubscribe-auth/AAFTCS5FNUIPWN6QVSIH3ZTQAXEURANCNFSM4ICN4NVA .

-- Peter Murray-Rust Founder ContentMine.org and Reader Emeritus in Molecular Informatics Dept. Of Chemistry, University of Cambridge, CB2 1EW, UK

vinitamehlawat commented 4 years ago

Dear Peter I have made some changes on a1draft.pptx on top-most part i.e History and Introduction of EssOilDB & added a Profile table for Chemical compounds with that oil bottle. Peter you also assigned me some work related to Wikidata Identifiers But I am not able to understand where should I put these IDs on Poster. Here i am pasting these for you further reference.

petermr commented 4 years ago

That's great! Don't worry, I'll do that!

P.

On Wed, Jul 24, 2019 at 7:19 AM vinitamehlawat notifications@github.com wrote:

Dear Peter I have made some changes on a1draft.pptx on top-most part i.e History and Introduction of EssOilDB & added a Profile table for Chemical compounds with that oil bottle. Peter you also assigned me some work related to Wikidata Identifiers But I am not able to understand where should I put these IDs on Poster. Here i am pasting these for you further reference.

  • Lantana camara (Q332469).
  • leaf (Q33971) / organ of a vascular plant, composing its foliage (very general term ).
  • flower (Q506) / structure found in some plants to support reproduction.
  • fruit (Q1364) / part of a flowering plant.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/gilienv/EssOilDB/issues/79?email_source=notifications&email_token=AAFTCS3SZXMI757LLKXQPK3QA7X6FA5CNFSM4ICN4NVKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD2VJOKY#issuecomment-514496299, or mute the thread https://github.com/notifications/unsubscribe-auth/AAFTCSYVUQWDGFQKTOUM6H3QA7X6FANCNFSM4ICN4NVA .

-- Peter Murray-Rust Founder ContentMine.org and Reader Emeritus in Molecular Informatics Dept. Of Chemistry, University of Cambridge, CB2 1EW, UK

petermr commented 4 years ago

We should use GBIF to resolve synonyms. Question. Does it have an API? What does it return? If it is simple it could solve this problem quite quickly. Vinita/Shruthi should report.

Shruthi-M commented 4 years ago

Greetings Sir, I have already started to use GBIF to resolve names. I have attached the output file (from GBIF). I am not sure about the resolution of synonyms.

Thank you With regards Shruthi M

On Thu, 25 Jul 2019 at 14:31, petermr notifications@github.com wrote:

We should use GBIF to resolve synonyms. Question. Does it have an API? What does it return? If it is simple it could solve this problem quite quickly. Vinita/Shruthi should report.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/gilienv/EssOilDB/issues/79?email_source=notifications&email_token=AMIWRYFCGCDEBRZGHXRTZILQBFTV7A5CNFSM4ICN4NVKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD2Y3AUI#issuecomment-514961489, or mute the thread https://github.com/notifications/unsubscribe-auth/AMIWRYGHVD6DGOVWCUOE4G3QBFTV7ANCNFSM4ICN4NVA .

petermr commented 4 years ago

Thanks!

On Thu, Jul 25, 2019 at 10:14 AM Shruthi-M notifications@github.com wrote:

Greetings Sir, I have already started to use GBIF to resolve names.

Good. Can you document the process (ideally in an issue).

I have attached the output file (from GBIF). I am not sure about the resolution of synonyms.

Where is the output file. The best thing is to commit it to Github rather than attach it to a mail.

Thank you With regards Shruthi M

On Thu, 25 Jul 2019 at 14:31, petermr notifications@github.com wrote:

We should use GBIF to resolve synonyms. Question. Does it have an API? What does it return? If it is simple it could solve this problem quite quickly. Vinita/Shruthi should report.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub < https://github.com/gilienv/EssOilDB/issues/79?email_source=notifications&email_token=AMIWRYFCGCDEBRZGHXRTZILQBFTV7A5CNFSM4ICN4NVKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD2Y3AUI#issuecomment-514961489 , or mute the thread < https://github.com/notifications/unsubscribe-auth/AMIWRYGHVD6DGOVWCUOE4G3QBFTV7ANCNFSM4ICN4NVA

.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/gilienv/EssOilDB/issues/79?email_source=notifications&email_token=AAFTCS2PZWUP4J5G4EHENVDQBFVIJA5CNFSM4ICN4NVKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD2Y4DYA#issuecomment-514965984, or mute the thread https://github.com/notifications/unsubscribe-auth/AAFTCSYLWSLH35NBNWUB7KTQBFVIJANCNFSM4ICN4NVA .

-- Peter Murray-Rust Founder ContentMine.org and Reader Emeritus in Molecular Informatics Dept. Of Chemistry, University of Cambridge, CB2 1EW, UK

petermr commented 4 years ago

I have been reading: https://www.gbif.org/en/developer/species which seems to provide what we want. Is this what you are using?

I'll copy some here:

Species API

http://api.gbif.org/v1/

I have issued:

api.gbif.org/v1/species?name=ocimum%20sanctum

and got:

{"offset":0,"limit":20,"endOfRecords":true,"results":[{"key":2927101,"nubKey":2927101,"nameKey":7681615,"taxonID":"gbif:2927101","sourceTaxonKey":143184691,"kingdom":"Plantae","phylum":"Tracheophyta","order":"Lamiales","family":"Lamiaceae","genus":"Ocimum","species":"Ocimum tenuiflorum","kingdomKey":6,"phylumKey":7707728,"classKey":220,"orderKey":408,"familyKey":2497,"genusKey":2874693,"speciesKey":2927100,"datasetKey":"d7dddbf4-2cf0-4f39-9b2a-bb099caae36c","constituentKey":"7ddf754f-d193-4cc9-b351-99906754a03b","parentKey":2874693,"parent":"Ocimum","acceptedKey":2927100,"accepted":"Ocimum tenuiflorum L.","scientificName":"Ocimum sanctum L.","canonicalName":"Ocimum sanctum","authorship":"L.","nameType":"SCIENTIFIC","rank":"SPECIES","origin":"SOURCE","taxonomicStatus":"SYNONYM","nomenclaturalStatus":[],"remarks":"","publishedIn":"Mant. pl. 1:85.  1767","numDescendants":0,"lastCrawled":"2018-06-20T14:41:51.801+0000","lastInterpreted":"2018-06-20T14:36:01.700+0000","issues":[
[many lines clipped]

Note the "taxonomicStatus":"SYNONYM" .

By contrast

api.gbif.org/v1/species?name=ocimum%20tenuiflorum

gives

"taxonomicStatus":"ACCEPTED"

suggesting that for Ocimum sanctum the accepted name is Ocimum tenuiflorum L.

We can automate this and save a huge amount of disambiguation work.