Open petermr opened 4 years ago
Treatment of VARIETIES and HYBRIDS
This is very important information but is relatively infrequent. We do not have a simple data model, so suggest:
plant
variety
field in the profile
data . The plant_id
field should refer to the plant
table.hybrid
field (not normalized text string) in the profile
. There should be NO plant_id
value. If particular hybrids are fund to be common and important we may normalize this.@petermr @Shruthi-M @gilienv I just stumbled upon this database of plant taxonomy. Maybe it's useful to you?
https://www.gbif.org/dataset/66dd0960-2d7d-46ee-a491-87b9adcfe7b1
Taxonomy tool: https://www.gbif.org/species/158596304
Seems like there's a link to download the entire data set here: (Not sure) https://www.gbif.org/dataset/66dd0960-2d7d-46ee-a491-87b9adcfe7b1#dataDescription
Description
GRIN taxonomic data provide the structure and nomenclature for accessions of the National Plant Germplasm System (NPGS), part of the National Genetic Resources Program (NGRP) of the United States Department of Agriculture’s (USDA’s) Agricultural Research Service (ARS). In GRIN Taxonomy for Plants all families and genera of vascular plants and over 46,000 species from throughout the world are represented, especially economic plants and their relatives. Information on scientific and common names, classification, distribution, references, and economic impacts are provided.
Thank you Manny, we use GBIF for most of our Ecology work and its one of the most dependable species databases. I have reminded Shruthi to take into account information for species from this to build her Plant Table.
Shruthi:
Please create the plant table with following columns:
Binomial Species Name Synonyms Habit (i.e Overall shape == Grass/Vine/Tree/Shrub/) Genus Family Order Class Phylum Kingdom
Most importantly - We will need to connect this table with existing IDs in the main infopdata ( which we are now moving to restructuring as profile) table.
For example, if you remove a wrong plant name from the original dataset, what happens to all the data that was connected to this one in the Main Tables?! We cannot afford to delete that.
Please discuss this in the next Skype call.
Note that all properties are DERIVED from the Binomial name. They are there for searching or browsing, not to represent the original paper. They could be recomputed at any time.
On Wed, Jul 17, 2019 at 8:44 AM Gitanjali Yadav notifications@github.com wrote:
Thank you Manny, we use GBIF for most of our Ecology work and its one of the most dependable species databases. I have reminded Shruthi to take into account information for species from this to build her Plant Table.
Shruthi:
Please create the plant table with following columns:
Binomial Species Name Synonyms Habit (i.e Overall shape == Grass/Vine/Tree/Shrub/) Genus Family Order Class Phylum Kingdom
Most importantly - We will need to connect this table with existing IDs in the main infopdata ( which we are now moving to restructuring as profile) table.
For example, if you remove a wrong plant name from the original dataset, what happens to all the data that was connected to this one in the Main Tables?! We cannot afford to delete that.
Please discuss this in the next Skype call.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/gilienv/EssOilDB/issues/79?email_source=notifications&email_token=AAFTCSYD6KJ54CMH7DUFPIDP73EWNA5CNFSM4ICN4NVKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD2DKYQY#issuecomment-512142403, or mute the thread https://github.com/notifications/unsubscribe-auth/AAFTCS3OFW3LIVLXFMO5H5LP73EWNANCNFSM4ICN4NVA .
-- Peter Murray-Rust Founder ContentMine.org and Reader Emeritus in Molecular Informatics Dept. Of Chemistry, University of Cambridge, CB2 1EW, UK
@Shruthi-M Hi Shruthi, would you please add me on Skype (Mannyrules) and Whatspp (+55 61 99675 3439) please?
Thanks! Manny
I believe that Taxize (TRNS) does NOT report synonymy. I entered Ocimum sanctum
and Ocimum tenuiflorum
and both reported they are Accepted
.
Does EssoilDB V2.0 regard these as synonyms or distinct species?
This will drastically affect the numbers we report on the poster.
Sir As of now, they are distinct species.
Thanks very much Vinita, I am putting together a poster which highlights disambiguation . I won't put in many details... but I'll use this as an example. Taxize did not appear to disambiguate. I think if we use GBIF it will - it does manually (if you put these in it will give "Accepted" for tenuiflorum and "Synonym" for sanctum. If there is an API that will solve it rapidly!! I am adding in Wikifactmine Dictionaries and this will change the poster. But you did a lot of work on dictionaries so it represents your work as well!
On Mon, Jul 22, 2019 at 4:10 PM vinitamehlawat notifications@github.com wrote:
Sir As of now, they are distinct species.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/gilienv/EssOilDB/issues/79?email_source=notifications&email_token=AAFTCS5RJYXBG5QMGEOOXP3QAXEURA5CNFSM4ICN4NVKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD2QHRAA#issuecomment-513833088, or mute the thread https://github.com/notifications/unsubscribe-auth/AAFTCS5FNUIPWN6QVSIH3ZTQAXEURANCNFSM4ICN4NVA .
-- Peter Murray-Rust Founder ContentMine.org and Reader Emeritus in Molecular Informatics Dept. Of Chemistry, University of Cambridge, CB2 1EW, UK
Dear Peter I have made some changes on a1draft.pptx on top-most part i.e History and Introduction of EssOilDB & added a Profile table for Chemical compounds with that oil bottle. Peter you also assigned me some work related to Wikidata Identifiers But I am not able to understand where should I put these IDs on Poster. Here i am pasting these for you further reference.
That's great! Don't worry, I'll do that!
P.
On Wed, Jul 24, 2019 at 7:19 AM vinitamehlawat notifications@github.com wrote:
Dear Peter I have made some changes on a1draft.pptx on top-most part i.e History and Introduction of EssOilDB & added a Profile table for Chemical compounds with that oil bottle. Peter you also assigned me some work related to Wikidata Identifiers But I am not able to understand where should I put these IDs on Poster. Here i am pasting these for you further reference.
- Lantana camara (Q332469).
- leaf (Q33971) / organ of a vascular plant, composing its foliage (very general term ).
- flower (Q506) / structure found in some plants to support reproduction.
- fruit (Q1364) / part of a flowering plant.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/gilienv/EssOilDB/issues/79?email_source=notifications&email_token=AAFTCS3SZXMI757LLKXQPK3QA7X6FA5CNFSM4ICN4NVKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD2VJOKY#issuecomment-514496299, or mute the thread https://github.com/notifications/unsubscribe-auth/AAFTCSYVUQWDGFQKTOUM6H3QA7X6FANCNFSM4ICN4NVA .
-- Peter Murray-Rust Founder ContentMine.org and Reader Emeritus in Molecular Informatics Dept. Of Chemistry, University of Cambridge, CB2 1EW, UK
We should use GBIF to resolve synonyms. Question. Does it have an API? What does it return? If it is simple it could solve this problem quite quickly. Vinita/Shruthi should report.
Greetings Sir, I have already started to use GBIF to resolve names. I have attached the output file (from GBIF). I am not sure about the resolution of synonyms.
Thank you With regards Shruthi M
On Thu, 25 Jul 2019 at 14:31, petermr notifications@github.com wrote:
We should use GBIF to resolve synonyms. Question. Does it have an API? What does it return? If it is simple it could solve this problem quite quickly. Vinita/Shruthi should report.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/gilienv/EssOilDB/issues/79?email_source=notifications&email_token=AMIWRYFCGCDEBRZGHXRTZILQBFTV7A5CNFSM4ICN4NVKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD2Y3AUI#issuecomment-514961489, or mute the thread https://github.com/notifications/unsubscribe-auth/AMIWRYGHVD6DGOVWCUOE4G3QBFTV7ANCNFSM4ICN4NVA .
Thanks!
On Thu, Jul 25, 2019 at 10:14 AM Shruthi-M notifications@github.com wrote:
Greetings Sir, I have already started to use GBIF to resolve names.
Good. Can you document the process (ideally in an issue).
I have attached the output file (from GBIF). I am not sure about the resolution of synonyms.
Where is the output file. The best thing is to commit it to Github rather than attach it to a mail.
Thank you With regards Shruthi M
On Thu, 25 Jul 2019 at 14:31, petermr notifications@github.com wrote:
We should use GBIF to resolve synonyms. Question. Does it have an API? What does it return? If it is simple it could solve this problem quite quickly. Vinita/Shruthi should report.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub < https://github.com/gilienv/EssOilDB/issues/79?email_source=notifications&email_token=AMIWRYFCGCDEBRZGHXRTZILQBFTV7A5CNFSM4ICN4NVKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD2Y3AUI#issuecomment-514961489 , or mute the thread < https://github.com/notifications/unsubscribe-auth/AMIWRYGHVD6DGOVWCUOE4G3QBFTV7ANCNFSM4ICN4NVA
.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/gilienv/EssOilDB/issues/79?email_source=notifications&email_token=AAFTCS2PZWUP4J5G4EHENVDQBFVIJA5CNFSM4ICN4NVKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD2Y4DYA#issuecomment-514965984, or mute the thread https://github.com/notifications/unsubscribe-auth/AAFTCSYLWSLH35NBNWUB7KTQBFVIJANCNFSM4ICN4NVA .
-- Peter Murray-Rust Founder ContentMine.org and Reader Emeritus in Molecular Informatics Dept. Of Chemistry, University of Cambridge, CB2 1EW, UK
I have been reading: https://www.gbif.org/en/developer/species which seems to provide what we want. Is this what you are using?
I'll copy some here:
Species API
http://api.gbif.org/v1/
I have issued:
api.gbif.org/v1/species?name=ocimum%20sanctum
and got:
{"offset":0,"limit":20,"endOfRecords":true,"results":[{"key":2927101,"nubKey":2927101,"nameKey":7681615,"taxonID":"gbif:2927101","sourceTaxonKey":143184691,"kingdom":"Plantae","phylum":"Tracheophyta","order":"Lamiales","family":"Lamiaceae","genus":"Ocimum","species":"Ocimum tenuiflorum","kingdomKey":6,"phylumKey":7707728,"classKey":220,"orderKey":408,"familyKey":2497,"genusKey":2874693,"speciesKey":2927100,"datasetKey":"d7dddbf4-2cf0-4f39-9b2a-bb099caae36c","constituentKey":"7ddf754f-d193-4cc9-b351-99906754a03b","parentKey":2874693,"parent":"Ocimum","acceptedKey":2927100,"accepted":"Ocimum tenuiflorum L.","scientificName":"Ocimum sanctum L.","canonicalName":"Ocimum sanctum","authorship":"L.","nameType":"SCIENTIFIC","rank":"SPECIES","origin":"SOURCE","taxonomicStatus":"SYNONYM","nomenclaturalStatus":[],"remarks":"","publishedIn":"Mant. pl. 1:85. 1767","numDescendants":0,"lastCrawled":"2018-06-20T14:41:51.801+0000","lastInterpreted":"2018-06-20T14:36:01.700+0000","issues":[
[many lines clipped]
Note the "taxonomicStatus":"SYNONYM"
.
By contrast
api.gbif.org/v1/species?name=ocimum%20tenuiflorum
gives
"taxonomicStatus":"ACCEPTED"
suggesting that for Ocimum sanctum
the accepted name is Ocimum tenuiflorum L.
We can automate this and save a huge amount of disambiguation work.
Plant names should be disambiguated at the binomial species level in the
plant
table. ThusLantana camara
L. camara
Lantata camara
(a typo) should all be mapped to the same species.Ocimum sanctum
should be mapped onto its preferred synonymOciumum tenuiflorum
EssoilDB is not a taxonomy site so there is no need to record synonyms for data entry. (It may be useful to search for synonyms but this will be through a different mechanism.)