Open jhpoelen opened 4 years ago
cc @cmungall and related to https://github.com/globalbioticinteractions/globalbioticinteractions/issues/472
Note that the raw data fields, along with the indexed / annotated fields can be found at:
https://depot.globalbioticinteractions.org/reviews/globalbioticinteractions/ncbi-virus/review.tsv.gz
where the 15th field is a json blob, which can be accessed using jq and cut
like:
curl -L "https://depot.globalbioticinteractions.org/reviews/globalbioticinteractions/ncbi-virus/review.tsv.gz" | gunzip | tail -n+2 | cut -f15 | head -n1 | jq .
(just showing first record)
{
"reviewId": "2e36a38e-c96c-4c44-b4d0-79fc737f89f4",
"reviewDate": "2020-04-08T15:38:03Z",
"reviewerName": "GloBI automated reviewer (elton-0.9.1)",
"reviewCommentType": "note",
"reviewComment": "target taxon name missing",
"namespace": "local",
"context": {
"interactionTypeNameVerbatim": "has host",
"interactionTypeName": "hasHost",
"Geo_Location": "Portugal",
"Authors": "Chapman,D.A., Tcherepanov,V., Upton,C., Dixon,L.K.",
"referenceCitation": "African swine fever virus OURT 88/3 (avirulent field isolate), complete genome",
"sourceTaxonPath": "Asfarviridae | Asfivirus | Asfivirus African swine fever virus",
"referenceAuthors": "Chapman,D.A., Tcherepanov,V., Upton,C., Dixon,L.K.",
"tableSchema": "schema.json",
"Segment": null,
"headerRowCount": "1",
"Isolation_Source": null,
"_doi": "10.1093/nar/gkw1065",
"localityName": "Portugal",
"Genus": "Asfivirus",
"interactionTypeId": "http://purl.obolibrary.org/obo/RO_0002454",
"Genome_Region": null,
"Species": "African swine fever virus",
"Collection_Date": null,
"Publications": "https://www.ncbi.nlm.nih.gov/pubmed/18198370",
"Accession": "NC_044957",
"sourceTaxonName": "Asfivirus African swine fever virus",
"sourceTaxonFamily": "Asfarviridae",
"Genotype": null,
"Host": null,
"studyTitle": "Eneida L. Hatcher, Sergey A. Zhdanov, Yiming Bao, Olga Blinkova, Eric P. Nawrocki, Yuri Ostapchuck, Alejandro A. Schäffer, J. Rodney Brister, Virus Variation Resource – improved response to emergent viral outbreaks, Nucleic Acids Research, Volume 45, Issue D1, January 2017, Pages D482–D490, https://doi.org/10.1093/nar/gkw1065 . Data downloaded via https://www.ncbi.nlm.nih.gov/labs/virus/vssi on 2020-03-14. Accessed at <file:///home/travis/build/globalbioticinteractions/ncbi-virus/> on 08 Apr 2020.African swine fever virus OURT 88/3 (avirulent field isolate), complete genome",
"targetBodyPartName": null,
"url": "sequences.csv.gz",
"dcterms:bibliographicCitation": "Eneida L. Hatcher, Sergey A. Zhdanov, Yiming Bao, Olga Blinkova, Eric P. Nawrocki, Yuri Ostapchuck, Alejandro A. Schäffer, J. Rodney Brister, Virus Variation Resource – improved response to emergent viral outbreaks, Nucleic Acids Research, Volume 45, Issue D1, January 2017, Pages D482–D490, https://doi.org/10.1093/nar/gkw1065 . Data downloaded via https://www.ncbi.nlm.nih.gov/labs/virus/vssi on 2020-03-14",
"referenceUrl": "https://www.ncbi.nlm.nih.gov/nuccore/NC_044957",
"studySourceCitation": "Eneida L. Hatcher, Sergey A. Zhdanov, Yiming Bao, Olga Blinkova, Eric P. Nawrocki, Yuri Ostapchuck, Alejandro A. Schäffer, J. Rodney Brister, Virus Variation Resource – improved response to emergent viral outbreaks, Nucleic Acids Research, Volume 45, Issue D1, January 2017, Pages D482–D490, https://doi.org/10.1093/nar/gkw1065 . Data downloaded via https://www.ncbi.nlm.nih.gov/labs/virus/vssi on 2020-03-14. Accessed at <file:///home/travis/build/globalbioticinteractions/ncbi-virus/> on 08 Apr 2020.",
"http://rs.tdwg.org/dwc/terms/eventDate": null,
"BioSample": null,
"sourceTaxonPathNames": "family | genus | species",
"Length": "171719",
"Release_Date": "2019-10-02T00:00:00Z",
"Nuc_Completeness": "refseq, complete",
"Family": "Asfarviridae",
"targetTaxonName": null,
"interactionTypeIdVerbatim": "http://purl.obolibrary.org/obo/RO_0002454",
"sourceTaxonGenus": "Asfivirus",
"GenBank_Title": "African swine fever virus OURT 88/3 (avirulent field isolate), complete genome",
"sourceTaxonSpecificEpithet": "African swine fever virus"
}
}
Name of the dataset GloBI indexed geospatial-temporal virus <> host interactions from virus sequences in NCBI Virus project. Includes: sequence reference, pmid reference (if available), locality, host name (if available), virus (species / genus / family), collection date.
You can find an annotated / indexed dataset at:
https://depot.globalbioticinteractions.org/reviews/globalbioticinteractions/ncbi-virus/indexed-interactions.tsv.gz
for source / citation, see https://github.com/globalbioticinteractions/ncbi-virus
Mapping or relevant fields
Coronaviridae \| Betacoronavirus \| Severe acute respiratory syndrome-related coronavirus
Severe acute respiratory syndrome coronavirus 2 isolate SARS-CoV-2/MHKN-1/human/2020/IRN ORF1ab polyprotein (orf1ab) gene, partial cds
Please note that dataset includes more than just corona viruses . Filter by:
indexed-interactions.tsv.gz | gunzip | grep Coronaviridae
to select corona virus family.If possible, highlight which fields map to nodes and which fields map to edges. Refer to Data Preparation for guidelines on how the final transformed data should be represented.
Additional context Add any other context, requests, concerns.