PathwayCommons / factoid

A project to capture biological pathway data from academic papers
https://biofactoid.org
MIT License
28 stars 7 forks source link

SARS-Cov-2 support #699

Closed jvwong closed 3 years ago

jvwong commented 4 years ago

Approach

  1. Provide article information
  2. Provide expected pathway
  3. Provide Factoid manually-created document
  4. Issues

1. Article information

SARS-CoV-2 Cell Entry Depends on ACE2 and TMPRSS2 and Is Blocked by a Clinically Proven Protease Inhibitor Hoffmann et al. Cell. 2020 Mar 4. pii: S0092-8674(20)30229-4. [Epub ahead of print]. Hoffmann M, Kleine-Weber H, Schroeder S, Krüger N, Herrler T, Erichsen S, Schiergens TS, Herrler G, Wu NH, Nitsche A, Müller MA, Drosten C, Pöhlmann S. image

2. Factoid manually-created document

image

3. Issues

jvwong commented 4 years ago

Similar story for Structure, Function, and Antigenicity of the SARS- CoV-2 Spike Glycoprotein Walls et al., 2020, Cell 180, 1–12 image

image

jvwong commented 4 years ago

Cryo-EM structure of the 2019-nCoV spike in the prefusion conformation Wrapp et al Science 13 Mar 2020: Vol. 367, Issue 6483, pp. 1260-1263

image

jvwong commented 4 years ago

Structural basis for the recognition of SARS-CoV-2 by full-length human ACE2 Yan et al. Science 27 Mar 2020: Vol. 367, Issue 6485, pp. 1444-1448 image

jvwong commented 4 years ago

Issues related to Biofactoid support:

maxkfranz commented 4 years ago

Need to support genes from many viruses

These viruses have distinct taxon IDs, and we decided to only support a predefined set of organisms. To support this, a new root organism must be added, with the strains specified in a way similar to E. coli.

jvwong commented 4 years ago

Need to support genes from many viruses

These viruses have distinct taxon IDs, and we decided to only support a predefined set of organisms. To support this, a new root organism must be added, with the strains specified in a way similar to E. coli.

Taxonomy:

Genes:

Proteins:

maxkfranz commented 4 years ago

This query gives a good sense of a "spike" or "S" result (but not taking into account any ranking): https://www.ncbi.nlm.nih.gov/gene/?term=spike+glycoprotein

jvwong commented 4 years ago

Sorry I edited above. The SARS-COV-2 genes are there, but most people will want to draw parallels with earlier coronaviruses and related, since well, we know squat. The issue with grounding to 'genes' is that viruses are rather tricky in using one 'gene' to encode a bunch of proteins, hence the uniprot stuff.

maxkfranz commented 4 years ago

The root organism for the virus would be more consistent with how other organisms are specified: 694009 instead of 2697049

maxkfranz commented 4 years ago

Does NCBI have a list of proteins associated with each gene entry in its data? I imagine you could have more specificity in the UI if the gene identifier maps to multiple proteins and the user wants to specify a particular one.

jvwong commented 4 years ago

Let's take this from the start.

  1. Below is the SARS-CoV-2 Genome Gordon et al bioRxiv. DNA (black); Gene/mRNAs (orange); proteins (blue, red, green)

    image
  2. 'Taxonomy'

  3. 'Genes' (i.e. expressed DNA regions) for SARS-COV-2

  4. Proteins associated with the gene/mRNA 'ORF1ab' (orange region in figure)

Links: NCBI EUTILS 'ELINK'

maxkfranz commented 4 years ago

Let's spec out the details in the grounding service repo in an issue. We need to decide on things like

jvwong commented 4 years ago

In testing driving, I think there may be a strong argument to add-back organism filtering: do show a selector when the search list is available. I found it frustrating to look for an e.g. Drosophila gene and have to zing past a whole bunch of unrelated species, even thought their names are closer.

jvwong commented 4 years ago

Let's spec out the details in the grounding service repo in an issue. We need to decide on things like

* what the root organism should be,

* what the display name of the organism family should be,

* what filters may be needed, and

* edge cases (e.g. does "S" work well when the organism is indexed, even though it's only one character).

I'm moving this issue to that remote....

jvwong commented 3 years ago

Issues related to Biofactoid support:

* Need to support genes from many viruses

  * Most of what is known relates to other, previously studied coronaviruses (MERS, SARS-Cov)and often other related and un-related viruses, which may or may not extrapolate