KarchinLab / HotMAPS_2016

Detects hotspot regions for somatic mutations in 3D protein structures
Apache License 2.0
2 stars 0 forks source link

a/b measn #4

Open Chensanyu opened 5 years ago

Chensanyu commented 5 years ago
  1. hotspot region in result : 1e96 HNSC 0:A:116;0:A:159;0:A:18;0:B:102;0:A:29;0:A:15 What does 0 mean? What does A/B stand for?

  2. How to know the correspondence between genes and hotspots? Can I upload mutations for different genes related to one cancer type in a maf file?

ctokheim commented 5 years ago
  1. The first numbers stands for which biological assembly was used for the protein structure. Pdb's may have multiple biological assemblies with subunits in different orientations of a complex. A/B (and potentially other letters) represents the protein chain. Protein chains may originate either from the same gene's protein product or from different genes.

  2. There are two options. If you just want to examine whether your mutations overlap with previously computed hotspots from TCGA, you can just upload your mutations to mupit (https://mupit.icm.jhu.edu/MuPIT_Interactive/) or CRAVAT (which will provide you links to mupit to see the protein structure, https://www.cravat.us/CRAVAT/). Alternative, you could try to cluster mutations based on your own set of mutations. This will require you to follow through the "exome-scale" pipeline of HotMAPS, https://github.com/KarchinLab/HotMAPS/wiki/Tutorial-(Exome-scale).

Chensanyu commented 5 years ago

Thank you very much, your answer is very helpful to me. But I am confused about the difference between the two result files called 'hotspot_regionsgene.01.txt' and 'hotspot_regionsstructure.01.txt'.**

  1. I guess the first file is about a gene with its mutations, and the second is a gene mapping structure. I want to know if this guess is correct? Whether the mutation in hotspot_regionsgene.01.txt is in one cluster or multiple clusters ? Also, are there any other connections or differences between the two files? Is ‘hotspot_regionsgene.01.txt' one of the final generated files?
  2. If I want to know how many clusters a gene contains and which mutations are in each cluster, I should focus on hotspot_regionsgene.01.txt' or 'hotspot_regionsstructure.01.txt'. Sorry to bother you, look forward to your reply.
ctokheim commented 5 years ago

The answer depends on what you are looking for. I created the "gene" file since there may be many structures for a particular protein and the clustering is not always the same. Basically the "gene" file merges the clustering results for all structures into one consensus. If you are only interested in which mutations group together, and not the underlying protein structure, than using the gene file is probably the best fit.