NCATSTranslator / Text-Mining-Provider-Roadmap

Roadmap and issue tracking for the NCATS Translator Text Mining Provider
MIT License
2 stars 2 forks source link

Formalize TRAPI attribute structure for concept cooccurrence query results #93

Open bill-baumgartner opened 3 years ago

bill-baumgartner commented 3 years ago

Create a preliminary TRAPI attribute structure for returning concept cooccurrence results. This structure can be modeled after the COHD attribute structure proposed by Matt Brush.

COHD example provided by Matt Brush

Screen Shot 2021-09-16 at 5 29 14 PM

Proposed Cooccurrence Attribute Structure

cooccurrence-attribute-schema

Proposed Node TSV

id name category
CHEBI:3215 bupivacaine biolink:ChemicalEntity
PR:000031567 leucine-rich repeat-containing protein 3B biolink:Protein

Proposed Edge TSV (Note: scroll table to see all columns)

subject predicate object id association_type supporting_study_results _attributes
CHEBI:3215 biolink:related_to PR:000031567 hcR2-6QIJratLDFyFxwcSO6UW1M biolink:Association tmkp:a1a1a1a1a1a1|tmkp:b2b2b2b2b2b2|tmkp:c3c3c3c3c3c3|tmkp:d4d4d4d4d4d4 ATTRIBUTE_JSON_BLOB

where the ATTRIBUTE_JSON_BLOB would be JSON represented by the following YAML:

- attribute_type_id: biolink:original_knowledge_source
  value: infores:text-mining-provider-cooccurrence
  value_type_id: biolink:InformationResource
  description: The Text Mining Provider Concept Cooccurrence KP from NCATS Translator provides cooccurrence metrics for text-mined concepts that cooccur at various levels, e.g. document, sentence, etc. in the biomedical literature.
  attribute_source: infores:text-mining-provider-cooccurrence

- attribute_type_id: biolink:supporting_data_source
  value: infores:pubmed
  value_type_id: biolink:InformationResource
  attribute_source: infores:text-mining-provider-cooccurrence

- attribute_type_id: biolink:supporting_study_result
      value: tmkp:a1a1a1a1a1a1
      value_type_id: biolink:DocumentLevelConceptCooccurrenceAnalysisResult
      description: a single result from computing cooccurrence metrics between two concepts that cooccur at the document level
      attribute_source: infores:text-mining-provider-cooccurrence    
      attributes: 

        - attribute_type_id: biolink:supporting_document    ## NOT CURRENTLY IN BIOLINK
          value: PMID:29085514|PMID:1236578
          value_type_id: biolink:Publication
          description: The documents where the concepts of this assertion were observed to cooccur at the document level.
          attribute_source: infores:pubmed

        - attribute_type_id: biolink:tmkp_concept1_count
          value: 123
          value_type_id: SIO:000794     # SIO:count
          description: The number of times concept #1 was observed to occur at the document level in the documents that were processed
          attribute_source: infores:text-mining-provider-cooccurrence

        - attribute_type_id: biolink:tmkp_concept2_count
          value: 321
          value_type_id: SIO:000794     # SIO:count
          description: The number of times concept #2 was observed to occur at the document level in the documents that were processed
          attribute_source: infores:text-mining-provider-cooccurrence

        - attribute_type_id: biolink:tmkp_concept_pair_count
          value: 2
          value_type_id: SIO:000794     # SIO:count
          description: The number of times the concepts of this assertion were observed to cooccur at the document level in the documents that were processed
          attribute_source: infores:text-mining-provider-cooccurrence

        - attribute_type_id: biolink:tmkp_normalized_google_distance
          value: 0.876
          value_type_id: EDAM:data_1772     # EDAM:score 
          description: The normalized google distance score for the concepts in this assertion based on their cooccurrence in the documents that were processed
          attribute_source: infores:text-mining-provider-cooccurrence

        - attribute_type_id: biolink:tmkp_pointwise_mutual_information
          value: 0.876
          value_type_id: EDAM:data_1772     # EDAM:score 
          description: The pointwise mutual information score for the concepts in this assertion based on their cooccurrence in the documents that were processed
          attribute_source: infores:text-mining-provider-cooccurrence

        - attribute_type_id: biolink:tmkp_normalized_pointwise_mutual_information
          value: 0.876
          value_type_id: EDAM:data_1772     # EDAM:score 
          description: The normalized pointwise mutual information score for the concepts in this assertion based on their cooccurrence in the documents that were processed
          attribute_source: infores:text-mining-provider-cooccurrence

        - attribute_type_id: biolink:tmkp_mutual_dependence
          value: 0.876
          value_type_id: EDAM:data_1772     # EDAM:score 
          description: The mutual dependence (PMI^2) score for the concepts in this assertion based on their cooccurrence in the documents that were processed
          attribute_source: infores:text-mining-provider-cooccurrence

        - attribute_type_id: biolink:tmkp_normalized_pointwise_mutual_information_max
          value: 0.876
          value_type_id: EDAM:data_1772     # EDAM:score 
          description: A variant of the normalized pointwise mutual information score for the concepts in this assertion based on their cooccurrence in the documents that were processed
          attribute_source: infores:text-mining-provider-cooccurrence

        - attribute_type_id: biolink:tmkp_log_frequency_biased_mutual_dependence
          value: 0.876
          value_type_id: EDAM:data_1772     # EDAM:score 
          description: The log frequency biased mutual dependence score for the concepts in this assertion based on their cooccurrence in the documents that were processed
          attribute_source: infores:text-mining-provider-cooccurrence

- attribute_type_id: biolink:supporting_study_result
      value: tmkp:b2b2b2b2b2b2 
      value_type_id: biolink:SentenceLevelConceptCooccurrenceAnalysisResult
      description: a single result from computing cooccurrence metrics between two concepts that cooccur at the sentence level
      attribute_source: infores:text-mining-provider-cooccurrence    
      attributes: 

            [SAME ATTRIBUTES AS ABOVE]

- attribute_type_id: biolink:supporting_study_result
      value: tmkp:c3c3c3c3c3c3 
      value_type_id: biolink:TitleLevelConceptCooccurrenceAnalysisResult
      description: a single result from computing cooccurrence metrics between two concepts that cooccur in the document title
      attribute_source: infores:text-mining-provider-cooccurrence    
      attributes: 

            [SAME ATTRIBUTES AS ABOVE]

- attribute_type_id: biolink:supporting_study_result
      value: tmkp:d4d4d4d4d4d4 
      value_type_id: biolink:AbstractLevelConceptCooccurrenceAnalysisResult
      description: a single result from computing cooccurrence metrics between two concepts that cooccur in the abstract
      attribute_source: infores:text-mining-provider-cooccurrence    
      attributes: 

            [SAME ATTRIBUTES AS ABOVE]