Open gtauriello opened 4 years ago
I'm here!
Just a small, political, correction: there are 10 members of the IMEx consortium that curate into IntAct. MINT and IntAct itself are just 2 of them. E.g., DIP have also contributed many SARS publications in this last month.
And yes, we have also decided to annotate to the longer polyprotein in SARS-CoV and SARS-CoV-2 (e.g. R1AB, P0DTD1) except for the small protein nsp11 that is only translated from the short polyprotein of SARS-CoV-2 (R1a, P0DTC1). The long polyprotein codes for nsp12 at the ribosomal slippage site.
For Complex Portal you can find the data via our organism page: https://www.ebi.ac.uk/complexportal/complex/organisms It also has a WS JSON endpoint but only via the individual AC queries. Or download the whole species file in xml via: ftp://ftp.ebi.ac.uk/pub/databases/intact/complex/current/psi30/
Any questions, please ask! Slack ID is the same as GitHub.
@all-contributors please add @bmeldal for ideas, content
@gtauriello
I've put up a pull request to add @bmeldal! :tada:
Thank you!
@gtauriello so the annotations are only for the virus proteins, right?
IntAct & ComplexPortal have both, virus and human proteins. Not sure if that was your question, though ;-)
@D-Barradas also unsure about the question.
Personally, I would start by looking at all interactions returned in the query above (or the download) and extract any positional data you can find. The query should restrict it to coronavirus-relevant interactions. The annotation system works for any UniProtKB AC and not just the virus proteins. So you can safely have annotations mapped e.g. on structures for the human proteins involved in those interactions...
Hi @gtauriello @bmeldal : sorry for the cryptic question, basically you have answer my question, I already uploaded my annotations, in that process, I found that the server does not like the the PRO_ 👍 Couldn't find P0DTD1-PRO_0000449623 by UniProt AC or MD5. <- this was the warning
Yes for the polyproteins, you will need to do some extra mapping. Assuming you have a position within P0DTD1-PRO... (or P0DTC1-PRO...) you need to proceed as follows:
As an example: say you have position 10 in P0DTD1-PRO_0000449623. From UniProt you see that PRO_0000449623 covers positions 3264-3569. That means that pos. 10 in P0DTD1-PRO_0000449623 corresponds to pos. 3273 in P0DTD1.
Also any position that you find in P0DTC1, should be mapped to P0DTD1 as long as it's not in the "Non-structural protein 11" (i.e. position >= 4393 of P0DTC1). Technically you could also duplicate all those annotations but it's easier to have them just once...
@bmeldal I am assuming above that your positions are 1-indexed: i.e. that the first AA of a protein is at position "1" and not "0". Is that correct?
Morning,
Yes, that is all correct! It's a shame that UniProt doesn't allow the PRO-chain search by default but @gtauriello 's workaround is correct. And yes, chain positions are 1-indexed. We should only have used P0DTD1 except for nsp11.
A nice example is here (thx @D-Barradas for pointing me to it). I quickly turned it manually into an annotation (see project link here):
P0DTC2,481,487,#FF0000,https://www.ebi.ac.uk/intact/interaction/EBI-25496287,mutation disrupting strength (p.Asn481_Asn487delinsThrProProAlaLeuAsn)
P0DTC2,493,493,#00FF00,https://www.ebi.ac.uk/intact/interaction/EBI-25496287,mutation decreasing strength (p.Gln493Asn)
P0DTC2,493,493,#00FF00,https://www.ebi.ac.uk/intact/interaction/EBI-25496287,mutation decreasing strength (p.Gln493Tyr)
P0DTC2,501,501,#FF0000,https://www.ebi.ac.uk/intact/interaction/EBI-25496287,mutation disrupting strength (p.Asn501Thr)
Q9BYF1,18,633,#0000FF,https://www.ebi.ac.uk/intact/interaction/EBI-25496287,sufficient to bind (ecd)
I will make sure that on our side we can nicely display annotations on both subunits of heteromers (currently you can see either ACE2 or spike annotations but not both at the same time).
Having a script that scans IntAct to extract a csv like above automatically (with some clever coloring logic) would be a really useful addition.
As a starting point here some files (thx @D-Barradas ): Archive.zip
It contains:
Still TODO:
So we ended up doing another script to extract PPI between SARS-CoV-2 and human proteins from IntAct. The script is loosely based on the one above and attached here: PPI-IntAct.zip
The result of it is a dedicated page on our server listing the structural coverage for all those interaction partners: https://swissmodel.expasy.org/repository/species/2697049/interactions
There's a typo on https://swissmodel.expasy.org/repository/species/2697049
"IntAct lists interactions derived from literature curation or direct user submissions. We extracted those interactions and list the ones between SARS-CoV-2 and human host proteins with their structural coverage in a decicated interaction page." should read dedicated
Freudian slip??? I know the data is not yet saturated... ;-)
Great work!
Please remember to cite IntAct in any resulting manuscripts.
Feature suggestion:
On the interactions page: https://swissmodel.expasy.org/repository/species/2697049/interactions
Allow the user to collapse the list for a given protein again without having to open another one. When the list is long (eg spike) it becomes difficult to navigate the page.
Oops good point with the typo. I must have been thirsty when I wrote that... ;-) The list gets collapsed as soon as you choose another one but we can add the feature. Doesn't hurt...
The two EBI resources IntAct and ComplexPortal contain curated data on experimentally observed interactions between proteins.
From the EBI webpage you find links to query the IntAct webpage or download the IntAct data in PSI-MI TAB format here: [ftp://ftp.ebi.ac.uk/pub/databases/intact/current/psi25/datasets/Coronavirus.zip].
Notes:
Also: Birgit Meldal from the IntAct / ComplexPortal team is available in the Slack channel for questions and I will update this comment if we get new input and links that can be of general use.