TGAC / brassica

Brassica Information Portal
GNU General Public License v3.0
6 stars 4 forks source link

cultivar names_repositories #447

Open teatree1212 opened 8 years ago

teatree1212 commented 8 years ago

Here are several repositories where officially registered cultivar names are stored. As they are dynamic, maybe it would be possible to not just download the current objects that are brassica related, but check there for correct spelling whenever a new cultivar is submitted via the BIP?

to check the names of current or formerly available commercial varieties: · European Commission – Plant variety database · Community Plant Variety Office – Applications and Titles in Force · Community Plant Variety Office – CPVO Variety Finder · OECD List of Varieties eligible for seed certification: OECD List of Varieties eligible for seed certification - OECD · United States Department of Agriculture – Plant Variety Database: Agricultural Marketing Service - Agricultural Marketing Service - Home

I read somewhere that you already had a script for checking spelling in taxonomic names @nowakowski

Otherwise i found these, too: maybe they pr yours can be modified to be applied to cultivar names? tassel iplant

teatree1212 commented 7 years ago

Has a cultivar cross-checker against these cultivar name repositories been implemented?

Nuanda commented 7 years ago

No, for various reasons, I think. The most important one is that when BIP is going to use some online resource, the resource needs to provide some sort of API - an interactive web form is simply not enough. Otherwise, maybe, if a complete dump is done for a given resource, we can import that inside BIP and check locally (as we did with e.g. Gramene TO).

teatree1212 commented 7 years ago

I think a dump sounds like a good idea. Has the spelling-check against existing Varieties been implemented?

Nuanda commented 7 years ago

Yes, all species input are tested for existence in BIP - so there should be no more proliferation of typo-difference misspelling of variety names inside BIP.

teatree1212 commented 7 years ago

@Nuanda In the paper we write " Cultivar names are curated, and submissions are checked against a local dictionary of cultivars."

is there such a dictionary-file that we could point to or should we rewrite this sentence saying something like:

"Cultivar names in the database are curated, and new submissions are checked against these cultivar names for spelling errors." ->maybe we have to say how that is done or which tool is used or point to the tool instead.

Or du you suggest an alternative?

Nuanda commented 7 years ago

The 'Species' input column is checked against the TaxonomyTerm table, and contents of that table is a merge of curated values from the CS model and the part of Gramene Taxonomy for Brassica, imported to BIP.