OmicsDI / ddi-web-service

Web service of the DDI Project
http://wwwdev.ebi.ac.uk/Tools/ddi/ws/
Apache License 2.0
0 stars 2 forks source link

the algorithm for annotation #19

Closed baimingze closed 9 years ago

baimingze commented 9 years ago

algorithm

1. get annotation info of a word

http://data.bioontology.org/annotator?text=modifications&longest_only=true&whole_word_only=false

2. get the first matched_word ("MODIFICATION") from annotation info (excludes the word not from first char, such as FICATION)

"annotations": [
{
    "from": 1,
    "to": 12,
    "matchType": "SYN",
    "text": "MODIFICATION"

}

3. deal the urls which comes from bioontology.org and the match is as same as the matched_word in step 2, to get the detail infos(Accession, ontology)

"@id": "http://purl.bioontology.org/ontology/HL7/C1554963",
"@id": "http://purl.bioontology.org/ontology/LNC/LA11133-8",

4. collect the synonyms from these urls

results(only with two ontologies: MESH and MS)

synonyms are in the square brackets

samples
['sample']
study
analysis
['analysis', 'chemical analysis', 'assay', 'determination']
sequencing
genome
['genome', 'genomes']
cell
['cell']
human
['human', 'homo sapiens', 'man (taxonomy)', 'man, modern', 'modern man']
performed
cancer
['cancer', 'benign neoplasms', 'tumors', 'neoplasm', 'benign neoplasm', 'cancers', 'neoplasm, benign', 'neoplasms, benign', 'neoplasia', 'tumor']
high
['high']
mass
ms
whole
exome
['exome', 'exomes']
dna
['dna', 'dna, double stranded', 'ds-dna', 'double-stranded dna', 'deoxyribonucleic acids', 'deoxyribonucleic acid', 'ds dna', 'dna, double-stranded']
patients
['patients', 'client', 'clients', 'patient']
disease
['disease', 'diseases']
genes
['genes', 'genetic materials', 'cistrons', 'materials, genetic', 'cistron', 'genetic material', 'material, genetic', 'gene']
genetic
['gene', 'genetic materials', 'cistrons', 'materials, genetic', 'cistron', 'genetic material', 'material, genetic']
used
['use']
identified
mutations
['mutations']
identify
proteome
['proteome', 'proteomes']
cells
['cells', 'cell']
project
wide
['wide']
sample
['sample']
part
gene
['gene', 'genetic materials', 'cistrons', 'materials, genetic', 'cistron', 'genetic material', 'material, genetic']
genomic
spectrometry
['spectrometry', 'analysis, spectrum', 'spectroscopy']
including
individuals
http
based
cases
cohort
lc
associated
baimingze commented 9 years ago

annotated words

these words have no annotation info
['study', 'sequencing', 'performed', 'mass', 'ms', 'whole', 'identified', 'identify', 'project', 'part', 'genomic', 'including', 'individuals', 'http', 'based', 'cases', 'cohort', 'lc']

{'mergedWords': ['samples', 'sample'], 'label': 'sample', 'frequent': '1320', 'No': '0'}
{'mergedWords': ['analysis'], 'label': 'analysis', 'frequent': '950', 'No': '1'}
{'mergedWords': ['genome'], 'label': 'genome', 'frequent': '828', 'No': '2'}
{'mergedWords': ['cell'], 'label': 'cell', 'frequent': '1004', 'No': '3'}
{'mergedWords': ['human'], 'label': 'human', 'frequent': '642', 'No': '4'}
{'mergedWords': ['cancer'], 'label': 'cancer', 'frequent': '545', 'No': '5'}
{'mergedWords': ['high'], 'label': 'high', 'frequent': '497', 'No': '6'}
{'mergedWords': ['exome'], 'label': 'exome', 'frequent': '455', 'No': '7'}
{'mergedWords': ['dna'], 'label': 'dna', 'frequent': '446', 'No': '8'}
{'mergedWords': ['patients'], 'label': 'patients', 'frequent': '427', 'No': '9'}
{'mergedWords': ['disease'], 'label': 'disease', 'frequent': '414', 'No': '10'}
{'mergedWords': ['genes', 'gene', 'genetic'], 'label': 'gene', 'frequent': '1119', 'No': '11'}
{'mergedWords': ['used'], 'label': 'used', 'frequent': '373', 'No': '13'}
{'mergedWords': ['mutations'], 'label': 'mutations', 'frequent': '364', 'No': '14'}
{'mergedWords': ['proteome'], 'label': 'proteome', 'frequent': '336', 'No': '15'}
{'mergedWords': ['wide'], 'label': 'wide', 'frequent': '320', 'No': '16'}
{'mergedWords': ['spectrometry'], 'label': 'spectrometry', 'frequent': '287', 'No': '17'}
{'mergedWords': ['associated'], 'label': 'associated', 'frequent': '269', 'No': '18'}
ypriverol commented 9 years ago

@baimingze this work is fantastic my friend, well done. How are you planning to store this information?, As far as I see we have two options:

My opinion is that this information should be in the XML for two main reasons:

- Experimental Factor Ontology - EFO 
- Tissue BRENDA ontology 
- GO (Gene ontology)

@baimingze we can discuss this issues on thursday.

baimingze commented 9 years ago

So far, we can get annotation info from EFO and TBO, except GO

  1. example detail info in EFO : http://www.ebi.ac.uk/efo/EFO_0000244
  2. TBO: 404 error at"http://data.bioontology.org/ontologies/BTO/classes/BTO_0000759", which is get from "data.bioontology.org/annotator?ontologies=BTO&text=liver"........we can use the self url
"self": "http://data.bioontology.org/ontologies/BTO/classes/http%3A%2F%2Fpurl.obolibrary.org%2Fobo%2FBTO_0000759","
  1. GO: got empty list from "http://data.bioontology.org/annotator?ontologies=GO&text=gene"