Open Chensanyu opened 5 years ago
The first numbers stands for which biological assembly was used for the protein structure. Pdb's may have multiple biological assemblies with subunits in different orientations of a complex. A/B (and potentially other letters) represents the protein chain. Protein chains may originate either from the same gene's protein product or from different genes.
There are two options. If you just want to examine whether your mutations overlap with previously computed hotspots from TCGA, you can just upload your mutations to mupit (https://mupit.icm.jhu.edu/MuPIT_Interactive/) or CRAVAT (which will provide you links to mupit to see the protein structure, https://www.cravat.us/CRAVAT/). Alternative, you could try to cluster mutations based on your own set of mutations. This will require you to follow through the "exome-scale" pipeline of HotMAPS, https://github.com/KarchinLab/HotMAPS/wiki/Tutorial-(Exome-scale).
Thank you very much, your answer is very helpful to me. But I am confused about the difference between the two result files called 'hotspot_regionsgene.01.txt' and 'hotspot_regionsstructure.01.txt'.**
The answer depends on what you are looking for. I created the "gene" file since there may be many structures for a particular protein and the clustering is not always the same. Basically the "gene" file merges the clustering results for all structures into one consensus. If you are only interested in which mutations group together, and not the underlying protein structure, than using the gene file is probably the best fit.
hotspot region in result : 1e96 HNSC 0:A:116;0:A:159;0:A:18;0:B:102;0:A:29;0:A:15 What does 0 mean? What does A/B stand for?
How to know the correspondence between genes and hotspots? Can I upload mutations for different genes related to one cancer type in a maf file?