Open davidroberson opened 7 years ago
Dave,
I don't believe it will work with the current state of HotSpot3D. There may be some issues since HotSpot3D uses information from UniProt and other databases to help with structure mapping. If the model is not in UniProt for your gene/protein then there should be errors in the uppro/calpro step. HotSpot3D looks to the chain information contained in UniProt for DBREF/PDB structures.
However, if your structure file is in the same format as a .pdb file, then there may be a way that we can work with non-RCSB/non-UniProt listed structures.
-Adam
On 11/16/16 2:46 PM, Dave Roberson wrote:
Is it possible to read in a horology modeled structure that is not in RCSB pdb using the
-pdb-file-dir
argument?
Thanks
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://github.com/ding-lab/hotspot3d/issues/11, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AEqLJ6BDXjGyfHe6pYyn84Qg2lINqNsiks5q-2uOgaJpZM4K0blC.
The materials in this message are private and may contain Protected Healthcare Information or other information of a sensitive nature. If you are not the intended recipient, be advised that any unauthorized use, disclosure, copying or the taking of any action in reliance on the contents of this information is strictly prohibited. If you have received this email in error, please immediately notify the sender via telephone or return mail.
Hi @AdamDS
The model is in pdb format and homology modeled off of 2ZPA
in Swiss-Model.
http://www.rcsb.org/pdb/explore.do?structureId=2ZPA
Thanks for your help!
@sabrodie
@sabrodie,
I think that there is a way to get this to work then. You'll need to be sure of a couple of details:
1) Name your model file 2ZPA.pdb and store it in the local pdb-dir that HotSpot3D will use.
2) Make sure that the protein chains are the same - that your homologous protein is labeled for the same chains as the original protein given in 2ZPA.
There may be some other necessary details, but I think that these two are the most critical.
-Adam
On 11/17/16 9:43 AM, Dave Roberson wrote:
Hi @AdamDShttps://github.com/AdamDS
The model is in pdb format and homology modeled off of 2ZPA in Swiss-Model. http://www.rcsb.org/pdb/explore.do?structureId=2ZPA
Thanks for your help!
@sabrodiehttps://github.com/sabrodie
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/ding-lab/hotspot3d/issues/11#issuecomment-261281518, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AEqLJ5GMHjRXundha9FhtYdHfnXzTDgtks5q_HYVgaJpZM4K0blC.
The materials in this message are private and may contain Protected Healthcare Information or other information of a sensitive nature. If you are not the intended recipient, be advised that any unauthorized use, disclosure, copying or the taking of any action in reliance on the contents of this information is strictly prohibited. If you have received this email in error, please immediately notify the sender via telephone or return mail.
I just noticed that your protein is non-human. In the transcript annotation step there will be errors, because HotSpot3D expects transcripts from Ensembl. There is a line that will not know how to deal with EnsemblBacteria transcripts. From what I can tell, the necessary files and lookups should largely be identical, so it could be possible to make some small tweaks to allow non-human proteins to be used. However, I am less familiar with processing them, so I cannot be sure how many other changes would be needed.
Thank you @AdamDS. I will talk to @sabrodie who is the functional scientist leading this project and get back to you. He did have one more question which I will paraphrase here:
...is it possible that the variants in our gene of interest are not in solved (crystalized) regions of the protein.
see http://www.uniprot.org/uniprot/O43683
Secondary structure
1
1085
Legend: HelixTurnBeta strand
Show more details
3D structure databases
Entry Method Resolution (Å) Chain Positions PDBsum
2LAH NMR - A 1-150 [»]
4A1G X-ray 2.60 A/B/C/D 1-150 [»]
4QPM X-ray 2.20 A/B 740-1085 [»]
4R8Q X-ray 2.31 A 724-1085 [»]
5DMZ X-ray 2.40 A/B 726-1085 [»]
It looks like the mutations fall into the AA#~500. Does that meanit is not represented in the crystal structures in the RCSB database?
FInally, is there an ideal number of genes to have present in the MAF file? We have many whole exomes worth of data...but are just focusing on a few genes. Should we change our approach?
@dave , @adamds That was in reference to another protein in the same project....a very different problem.
Seth Brodie PhD Senior Scientist Functional Group Cancer Genomics Research Laboratory (CGR) Division of Cancer Epidemiology and Genetics, NCI Leidos Biomedical Research, Inc. 8717 Grovemont Circle ATC Room 225B(office) Room 109(lab) Gaithersburg, MD 20877
-----Original Message----- From: Dave Roberson [notifications@github.commailto:notifications@github.com] Sent: Thursday, November 17, 2016 06:47 PM Eastern Standard Time To: ding-lab/hotspot3d Cc: Brodie, Seth (NIH/NCI) [C]; Mention Subject: Re: [ding-lab/hotspot3d] Homology modeled structure? (#11)
Thank you @AdamDShttps://github.com/AdamDS. I will talk to @sabrodiehttps://github.com/sabrodie who is the functional scientist leading this project and get back to you. He did have one more question which I will paraphrase here:
...is it possible that the variants in our gene of interest are not in solved (crystalized) regions of the protein. see http://www.uniprot.org/uniprot/O43683 Secondary structure 1 1085 Legend: HelixTurnBeta strand Show more details 3D structure databases Entry Method Resolution (Å) Chain Positions PDBsum 2LAH NMR - A 1-150 [»] 4A1G X-ray 2.60 A/B/C/D 1-150 [»] 4QPM X-ray 2.20 A/B 740-1085 [»] 4R8Q X-ray 2.31 A 724-1085 [»] 5DMZ X-ray 2.40 A/B 726-1085 [»]
It looks like the mutations fall into the AA#~500. Does that meanit is not represented in the crystal structures in the RCSB database?
FInally, is there an ideal number of genes to have present in the MAF file? We have many whole exomes worth of data...but are just focusing on a few genes. Should we change our approach?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/ding-lab/hotspot3d/issues/11#issuecomment-261406525, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AVw8zuuaNVbAef27qC_IvQ3P5EJE2kzDks5q_OebgaJpZM4K0blC.
Some variants do end up in non-solved regions of the models. HotSpot3D cannot do anything with these at this time. If you know that you will only need to look at a handful of genes, I very much recommend that you use a subset of your original .maf that contains only mutations from your genes of interest. This will drastically reduce run time and storage space usage. For perspective, preprocessing the ~5k human protein pdb structures takes ~1week to run on an LSF server and the data will take up ~2TB of space. We are in the process of optimizing HotSpot3D preprocessing to improve both run time and storage usage, but these updates are not yet in place. For the analysis steps, even with ~1M mutations in several thousand genes, analysis run times can take ~1day (without the sigclus step), so even there it will be useful to reduce the .maf to the genes of interest.
@sabrodie With the latest updates, we can now provide a way to support alternative Ensembl releases and reference genomes. I think that there are a couple of other things that could be done in the Trans.pm & Uppro.pm modules to support bacteria and other species. If you are still interested, perhaps we can work out a solution to help support other species data.
Is it possible to read in a homology modeled structure that is not in RCSB pdb using the
-pdb-file-dir
argument?
Thanks