Knowledge-Graph-Hub / kg-covid-19

An instance of KG Hub to produce a knowledge graph for COVID-19 response.
https://github.com/Knowledge-Graph-Hub/kg-covid-19/wiki
BSD 3-Clause "New" or "Revised" License
79 stars 26 forks source link

GloBI indexed geospatial-temporal virus <> host interactions from virus sequences in NCBI Virus #75

Open jhpoelen opened 4 years ago

jhpoelen commented 4 years ago

Name of the dataset GloBI indexed geospatial-temporal virus <> host interactions from virus sequences in NCBI Virus project. Includes: sequence reference, pmid reference (if available), locality, host name (if available), virus (species / genus / family), collection date.

You can find an annotated / indexed dataset at:

https://depot.globalbioticinteractions.org/reviews/globalbioticinteractions/ncbi-virus/indexed-interactions.tsv.gz

for source / citation, see https://github.com/globalbioticinteractions/ncbi-virus

Mapping or relevant fields

A clear and concise description of what which fields you would want to be ingested. field name desc example
sourceTaxonName source (virus) taxon name Severe acute respiratory syndrome-related coronavirus
sourceTaxonPath pipe delimited virus taxon hierarchy Coronaviridae \| Betacoronavirus \| Severe acute respiratory syndrome-related coronavirus
interactionTypeId kind of biotic interaction http://purl.obolibrary.org/obo/RO_0002454
interactionTypeName kind of biotic interaction has host
targetTaxonName target (host) taxon name Homo sapiens
targetBodyPartName part of host virus was found oronasopharynx
referenceUrl link to virus sequence in NCBI https://www.ncbi.nlm.nih.gov/nuccore/MT152900
referenceCitation title of virus sequence in NCBI Severe acute respiratory syndrome coronavirus 2 isolate SARS-CoV-2/MHKN-1/human/2020/IRN ORF1ab polyprotein (orf1ab) gene, partial cds
locality sample location Iran
eventDate sample date 2020-02-26

Please note that dataset includes more than just corona viruses . Filter by:

indexed-interactions.tsv.gz | gunzip | grep Coronaviridae to select corona virus family.

If possible, highlight which fields map to nodes and which fields map to edges. Refer to Data Preparation for guidelines on how the final transformed data should be represented.

Additional context Add any other context, requests, concerns.

jhpoelen commented 4 years ago

cc @cmungall and related to https://github.com/globalbioticinteractions/globalbioticinteractions/issues/472

jhpoelen commented 4 years ago

Note that the raw data fields, along with the indexed / annotated fields can be found at:

https://depot.globalbioticinteractions.org/reviews/globalbioticinteractions/ncbi-virus/review.tsv.gz

where the 15th field is a json blob, which can be accessed using jq and cut like:

curl -L "https://depot.globalbioticinteractions.org/reviews/globalbioticinteractions/ncbi-virus/review.tsv.gz" | gunzip | tail -n+2 | cut -f15 | head -n1 | jq . (just showing first record)

{
  "reviewId": "2e36a38e-c96c-4c44-b4d0-79fc737f89f4",
  "reviewDate": "2020-04-08T15:38:03Z",
  "reviewerName": "GloBI automated reviewer (elton-0.9.1)",
  "reviewCommentType": "note",
  "reviewComment": "target taxon name missing",
  "namespace": "local",
  "context": {
    "interactionTypeNameVerbatim": "has host",
    "interactionTypeName": "hasHost",
    "Geo_Location": "Portugal",
    "Authors": "Chapman,D.A., Tcherepanov,V., Upton,C., Dixon,L.K.",
    "referenceCitation": "African swine fever virus OURT 88/3 (avirulent field isolate), complete genome",
    "sourceTaxonPath": "Asfarviridae | Asfivirus | Asfivirus African swine fever virus",
    "referenceAuthors": "Chapman,D.A., Tcherepanov,V., Upton,C., Dixon,L.K.",
    "tableSchema": "schema.json",
    "Segment": null,
    "headerRowCount": "1",
    "Isolation_Source": null,
    "_doi": "10.1093/nar/gkw1065",
    "localityName": "Portugal",
    "Genus": "Asfivirus",
    "interactionTypeId": "http://purl.obolibrary.org/obo/RO_0002454",
    "Genome_Region": null,
    "Species": "African swine fever virus",
    "Collection_Date": null,
    "Publications": "https://www.ncbi.nlm.nih.gov/pubmed/18198370",
    "Accession": "NC_044957",
    "sourceTaxonName": "Asfivirus African swine fever virus",
    "sourceTaxonFamily": "Asfarviridae",
    "Genotype": null,
    "Host": null,
    "studyTitle": "Eneida L. Hatcher, Sergey A. Zhdanov, Yiming Bao, Olga Blinkova, Eric P. Nawrocki, Yuri Ostapchuck, Alejandro A. Schäffer, J. Rodney Brister, Virus Variation Resource – improved response to emergent viral outbreaks, Nucleic Acids Research, Volume 45, Issue D1, January 2017, Pages D482–D490, https://doi.org/10.1093/nar/gkw1065 . Data downloaded via https://www.ncbi.nlm.nih.gov/labs/virus/vssi on 2020-03-14. Accessed at <file:///home/travis/build/globalbioticinteractions/ncbi-virus/> on 08 Apr 2020.African swine fever virus OURT 88/3 (avirulent field isolate), complete genome",
    "targetBodyPartName": null,
    "url": "sequences.csv.gz",
    "dcterms:bibliographicCitation": "Eneida L. Hatcher, Sergey A. Zhdanov, Yiming Bao, Olga Blinkova, Eric P. Nawrocki, Yuri Ostapchuck, Alejandro A. Schäffer, J. Rodney Brister, Virus Variation Resource – improved response to emergent viral outbreaks, Nucleic Acids Research, Volume 45, Issue D1, January 2017, Pages D482–D490, https://doi.org/10.1093/nar/gkw1065 . Data downloaded via https://www.ncbi.nlm.nih.gov/labs/virus/vssi on 2020-03-14",
    "referenceUrl": "https://www.ncbi.nlm.nih.gov/nuccore/NC_044957",
    "studySourceCitation": "Eneida L. Hatcher, Sergey A. Zhdanov, Yiming Bao, Olga Blinkova, Eric P. Nawrocki, Yuri Ostapchuck, Alejandro A. Schäffer, J. Rodney Brister, Virus Variation Resource – improved response to emergent viral outbreaks, Nucleic Acids Research, Volume 45, Issue D1, January 2017, Pages D482–D490, https://doi.org/10.1093/nar/gkw1065 . Data downloaded via https://www.ncbi.nlm.nih.gov/labs/virus/vssi on 2020-03-14. Accessed at <file:///home/travis/build/globalbioticinteractions/ncbi-virus/> on 08 Apr 2020.",
    "http://rs.tdwg.org/dwc/terms/eventDate": null,
    "BioSample": null,
    "sourceTaxonPathNames": "family | genus | species",
    "Length": "171719",
    "Release_Date": "2019-10-02T00:00:00Z",
    "Nuc_Completeness": "refseq, complete",
    "Family": "Asfarviridae",
    "targetTaxonName": null,
    "interactionTypeIdVerbatim": "http://purl.obolibrary.org/obo/RO_0002454",
    "sourceTaxonGenus": "Asfivirus",
    "GenBank_Title": "African swine fever virus OURT 88/3 (avirulent field isolate), complete genome",
    "sourceTaxonSpecificEpithet": "African swine fever virus"
  }
}