Closed jvwong closed 3 years ago
Similar story for Structure, Function, and Antigenicity of the SARS- CoV-2 Spike Glycoprotein Walls et al., 2020, Cell 180, 1–12
Cryo-EM structure of the 2019-nCoV spike in the prefusion conformation Wrapp et al Science 13 Mar 2020: Vol. 367, Issue 6483, pp. 1260-1263
Structural basis for the recognition of SARS-CoV-2 by full-length human ACE2 Yan et al. Science 27 Mar 2020: Vol. 367, Issue 6485, pp. 1444-1448
Issues related to Biofactoid support:
Need to support genes from many viruses
These viruses have distinct taxon IDs, and we decided to only support a predefined set of organisms. To support this, a new root organism must be added, with the strains specified in a way similar to E. coli.
Need to support genes from many viruses
These viruses have distinct taxon IDs, and we decided to only support a predefined set of organisms. To support this, a new root organism must be added, with the strains specified in a way similar to E. coli.
Taxonomy:
Genes:
Proteins:
This query gives a good sense of a "spike" or "S" result (but not taking into account any ranking): https://www.ncbi.nlm.nih.gov/gene/?term=spike+glycoprotein
Sorry I edited above. The SARS-COV-2 genes are there, but most people will want to draw parallels with earlier coronaviruses and related, since well, we know squat. The issue with grounding to 'genes' is that viruses are rather tricky in using one 'gene' to encode a bunch of proteins, hence the uniprot stuff.
Does NCBI have a list of proteins associated with each gene entry in its data? I imagine you could have more specificity in the UI if the gene identifier maps to multiple proteins and the user wants to specify a particular one.
Let's take this from the start.
Below is the SARS-CoV-2 Genome Gordon et al bioRxiv. DNA (black); Gene/mRNAs (orange); proteins (blue, red, green)
'Taxonomy'
'Genes' (i.e. expressed DNA regions) for SARS-COV-2
Proteins associated with the gene/mRNA 'ORF1ab' (orange region in figure)
Links: NCBI EUTILS 'ELINK'
Let's spec out the details in the grounding service repo in an issue. We need to decide on things like
In testing driving, I think there may be a strong argument to add-back organism filtering: do show a selector when the search list is available. I found it frustrating to look for an e.g. Drosophila gene and have to zing past a whole bunch of unrelated species, even thought their names are closer.
Let's spec out the details in the grounding service repo in an issue. We need to decide on things like
* what the root organism should be, * what the display name of the organism family should be, * what filters may be needed, and * edge cases (e.g. does "S" work well when the organism is indexed, even though it's only one character).
I'm moving this issue to that remote....
Issues related to Biofactoid support:
* Need to support genes from many viruses * Most of what is known relates to other, previously studied coronaviruses (MERS, SARS-Cov)and often other related and un-related viruses, which may or may not extrapolate
Approach
1. Article information
SARS-CoV-2 Cell Entry Depends on ACE2 and TMPRSS2 and Is Blocked by a Clinically Proven Protease Inhibitor Hoffmann et al. Cell. 2020 Mar 4. pii: S0092-8674(20)30229-4. [Epub ahead of print]. Hoffmann M, Kleine-Weber H, Schroeder S, Krüger N, Herrler T, Erichsen S, Schiergens TS, Herrler G, Wu NH, Nitsche A, Müller MA, Drosten C, Pöhlmann S.
2. Factoid manually-created document
3. Issues