gilienv / EssOilDB

Restructuring of Essential Oil Database
Apache License 2.0
8 stars 6 forks source link

Restructuring this repository #84

Open petermr opened 5 years ago

petermr commented 5 years ago

The rush to create a poster has led to a muddled structure i the repo (mainly my fault). I am restructuring the directories and shall try to triage old and unnecessary files.

@manishkumarnipgr @mannyrules @gilienv @Shruthi-M @gilienv

PLEASE COMMENT

This is more complex than it appeared at first sight. I think we have the categories:

Ingestion

This is the input and conversion of data to V2.0 format. There are two cases:

V1.0 files

This is a finite task but we need a deadline for ingestion as V1.0 format. Or have we decided that all new input will be into V2.0 from now on? My recommendation is that we do not ingest any V1.0 format.

new articles directly from the literature

We will open a new issue for this.

This will require several steps:

discovery of articles (includes policy)

machine reading and automatic extraction

manual editing

matching with V2.0 knowledge

conversion to V2.0 structure

Core

This is the post-ingestion management of normalized data, i.e. what we offer to the world. It consists of just the core information extracted from the articles. It is then assembled into experiment-independent (ExpInd) resources (e.g. plants) and experiment-dependent (ExpDep) resources (e.g. date-times, links to plants).

ExpInd (experiment-independent)

don't like this name - also confusable with "independent variables" This resource could be offered publicly independently of any experiment, e.g. "here is a useful knowledge bundle of plants, compounds, etc which could be used in other knowledge experiments.

compound

Everything required to identify the compound uniquely including names, formulae, and IDs from foreign authorities, but no derived properties.

plant

Everything required to identify the plant uniquely including names, and IDs from foreign authorities, but no derived properties (family, habit).

plantpart

A controlled vocabulary of parts.

location

Everything required to identify the location uniquely including names, coordinates, and IDs from foreign authorities. ExpInd locations will be normalized names and not include ranges.

bibliography

A standalone bundle of articles/references. Initially this will be references to publications,e.g. DOIs and other Ids but later might include transformed normalized annotated Open articles.

conditions

Controlled vocabulary of common experimental conditions

extraction

Controlled vocabulary of common extraction methods

analysis

Controlled vocabulary of common analytical methods

wikifactmine dictionaries

Bespoke dictionaries created by researchers. This will be a dynamic resource.

ExpDep (experiment-dependent)

don't like this name - also confusable with "dependent variables" This consists of

datetime

The date-time of the reported experiment/s. These may include repeated values and ranges.

biblio-ref

Reference to the article/s in which the experiment was published

plant-ref

Reference to the plants in the experiment.

compound-ref

Reference to the compounds in the experiment.

location-ref

Reference to the location/s in the experiment.

plantpart-ref

Reference to the plantpart/s in the experiment.

dictionary-ref

Reference to the dictionaries annotating the experiment.

conditions

Other conditions of the experiment. Some of these will be free-text, some might be links to WikiFactMine dictionaries.

Metadata

Record of the experiment. Includes bibliographic record

bibliography-ref

funder-ref

institution-ref

Results and data

Currently only the profile data (compound-percent name-value pairs). In future there could be other observations.

profiledata

Shruthi-M commented 5 years ago

Sir, thank you for the details. We will look into it