The rush to create a poster has led to a muddled structure i the repo (mainly my fault). I am restructuring the directories and shall try to triage old and unnecessary files.
This is more complex than it appeared at first sight. I think we have the categories:
Ingestion
This is the input and conversion of data to V2.0 format. There are two cases:
V1.0 files
This is a finite task but we need a deadline for ingestion as V1.0 format. Or have we decided that all new input will be into V2.0 from now on? My recommendation is that we do not ingest any V1.0 format.
new articles directly from the literature
We will open a new issue for this.
This will require several steps:
discovery of articles (includes policy)
machine reading and automatic extraction
manual editing
matching with V2.0 knowledge
conversion to V2.0 structure
Core
This is the post-ingestion management of normalized data, i.e. what we offer to the world. It consists of just the core information extracted from the articles. It is then assembled into experiment-independent (ExpInd) resources (e.g. plants) and experiment-dependent (ExpDep) resources (e.g. date-times, links to plants).
ExpInd (experiment-independent)
don't like this name - also confusable with "independent variables"
This resource could be offered publicly independently of any experiment, e.g. "here is a useful knowledge bundle of plants, compounds, etc which could be used in other knowledge experiments.
compound
Everything required to identify the compound uniquely including names, formulae, and IDs from foreign authorities, but no derived properties.
plant
Everything required to identify the plant uniquely including names, and IDs from foreign authorities, but no derived properties (family, habit).
plantpart
A controlled vocabulary of parts.
location
Everything required to identify the location uniquely including names, coordinates, and IDs from foreign authorities. ExpInd locations will be normalized names and not include ranges.
bibliography
A standalone bundle of articles/references. Initially this will be references to publications,e.g. DOIs and other Ids but later might include transformed normalized annotated Open articles.
conditions
Controlled vocabulary of common experimental conditions
extraction
Controlled vocabulary of common extraction methods
analysis
Controlled vocabulary of common analytical methods
wikifactmine dictionaries
Bespoke dictionaries created by researchers. This will be a dynamic resource.
ExpDep (experiment-dependent)
don't like this name - also confusable with "dependent variables"
This consists of
datetime
The date-time of the reported experiment/s. These may include repeated values and ranges.
biblio-ref
Reference to the article/s in which the experiment was published
plant-ref
Reference to the plants in the experiment.
compound-ref
Reference to the compounds in the experiment.
location-ref
Reference to the location/s in the experiment.
plantpart-ref
Reference to the plantpart/s in the experiment.
dictionary-ref
Reference to the dictionaries annotating the experiment.
conditions
Other conditions of the experiment. Some of these will be free-text, some might be links to WikiFactMine dictionaries.
Metadata
Record of the experiment. Includes bibliographic record
bibliography-ref
funder-ref
institution-ref
Results and data
Currently only the profile data (compound-percent name-value pairs). In future there could be other observations.
The rush to create a poster has led to a muddled structure i the repo (mainly my fault). I am restructuring the directories and shall try to triage old and unnecessary files.
@manishkumarnipgr @mannyrules @gilienv @Shruthi-M @gilienv
PLEASE COMMENT
This is more complex than it appeared at first sight. I think we have the categories:
Ingestion
This is the input and conversion of data to V2.0 format. There are two cases:
V1.0 files
This is a finite task but we need a deadline for ingestion as V1.0 format. Or have we decided that all new input will be into V2.0 from now on? My recommendation is that we do not ingest any V1.0 format.
new articles directly from the literature
We will open a new issue for this.
This will require several steps:
discovery of articles (includes policy)
machine reading and automatic extraction
manual editing
matching with V2.0 knowledge
conversion to V2.0 structure
Core
This is the post-ingestion management of normalized data, i.e. what we offer to the world. It consists of just the core information extracted from the articles. It is then assembled into experiment-independent (ExpInd) resources (e.g. plants) and experiment-dependent (ExpDep) resources (e.g. date-times, links to plants).
ExpInd (experiment-independent)
don't like this name - also confusable with "independent variables" This resource could be offered publicly independently of any experiment, e.g. "here is a useful knowledge bundle of plants, compounds, etc which could be used in other knowledge experiments.
compound
Everything required to identify the compound uniquely including names, formulae, and IDs from foreign authorities, but no derived properties.
plant
Everything required to identify the plant uniquely including names, and IDs from foreign authorities, but no derived properties (family, habit).
plantpart
A controlled vocabulary of parts.
location
Everything required to identify the location uniquely including names, coordinates, and IDs from foreign authorities. ExpInd locations will be normalized names and not include ranges.
bibliography
A standalone bundle of articles/references. Initially this will be references to publications,e.g. DOIs and other Ids but later might include transformed normalized annotated Open articles.
conditions
Controlled vocabulary of common experimental conditions
extraction
Controlled vocabulary of common extraction methods
analysis
Controlled vocabulary of common analytical methods
wikifactmine dictionaries
Bespoke dictionaries created by researchers. This will be a dynamic resource.
ExpDep (experiment-dependent)
don't like this name - also confusable with "dependent variables" This consists of
datetime
The date-time of the reported experiment/s. These may include repeated values and ranges.
biblio-ref
Reference to the article/s in which the experiment was published
plant-ref
Reference to the plants in the experiment.
compound-ref
Reference to the compounds in the experiment.
location-ref
Reference to the location/s in the experiment.
plantpart-ref
Reference to the plantpart/s in the experiment.
dictionary-ref
Reference to the dictionaries annotating the experiment.
conditions
Other conditions of the experiment. Some of these will be free-text, some might be links to WikiFactMine dictionaries.
Metadata
Record of the experiment. Includes bibliographic record
bibliography-ref
funder-ref
institution-ref
Results and data
Currently only the profile data (compound-percent name-value pairs). In future there could be other observations.
profiledata