gilienv / EssOilDB

Restructuring of Essential Oil Database
Apache License 2.0
8 stars 6 forks source link

strategy for identifying chemical names #46

Open petermr opened 5 years ago

petermr commented 5 years ago

We need a protocol for identifying chemical names efficiently. It should be:

The tools are:

For the trivial names we need to find their frequencies in EssoilDB. For example you have "valerenol". If this occurs once it's low priority. If it occurs 20 times it's clearly important.

It should be possible to find the frequencies of all chemicals in EssoilDB. A standard SQL search should do this.

Shruthi-M commented 5 years ago

We need a protocol for identifying chemical names efficiently. It should be:

  • as automatic as possible
  • deal with the most common problems first

Shall look into this. The tools are:

  • OPSIN (in batch). Can you give a figure for the conversion rate?

As per the analysis of the results of sample data, it is less than 50%.

  • Wikidata lookup.
  • ChEBI / Pubchem

I think this covers most things.

For the trivial names we need to find their frequencies in EssoilDB. For example you have "valerenol". If this occurs once it's low priority. If it occurs 20 times it's clearly important.

It should be possible to find the frequencies of all chemicals in EssoilDB. A standard SQL search should do this.

Ok sir. I have never used SQL before. But, I will definitely try and get back to you.