deepomicslab / SpecHLA

SpecHLA reconstructs entire diploid sequences of HLA genes and infers LOH events. It supports HLA-A, -B, -C, -DPA1, -DPB1, -DQA1, -DQB1, and -DRB1 genes. Also, it supports both short- and long-read data.
MIT License
28 stars 9 forks source link

Identify somatic HLA mutation from SpecHLA #23

Open LongpanUPC opened 5 months ago

LongpanUPC commented 5 months ago

Hi, Thank you for developing this tool which is quite easy to use! I am wondering if SpecHLA could also be used to detect somatic HLA mutations since we have paired tumor and normal WES or RNAseq data. Also I found some HLA_XXX.vcf.gz/HLA_XXX.rephase.vcf.gz/HLA_XXX.specHap.phased.vcf in the output. I don't know if SpecHLA already did this things. And maybe I just need to compare the results from tumor with that from normal to get the distinct varaints? Could you give some comments about this and guide us how to identify the somatic (or germline if possible) HLA mutations using SpecHLA if it works? thank you very much! Long

wshuai294 commented 5 months ago

Hello Long,

Thanks for using this method. SpecHLA does not support detecting somatic HLA mutations directly. However, as you said, the somatic mutations can be obtained by comparing the variants between tumor and normal samples. HLA_XXX.rephase.vcf.gz records the phased variants (i.e., haplotypes). To achieve this goal, we can first align the haplotypes between tumor and normal samples by counting the shared genotypes at all variant loci. Second, just identify the difference between the aligned haplotypes from tumor and normal samples. The loci where the difference occurs could be the somatic variants.

I hope it can help. Please let me know if there is any other issue.

best, shuai

LongpanUPC commented 5 months ago

Thank you, shuai! I tried one case. However, it seems that it doesn't work well. In the normal tissue, I find only 10 variants while there are 35 variants from paired tumor. I found the average depth in the normal is lower than that in the tumor (less than 15 in N and average >30 in tumor). So the coverage greatly affected the findings. By the way, do you have a plan to include HLA mutation detection as an additional pipeline in SpecHLA? best, Long

wshuai294 commented 5 months ago

Hi Long,

To examine the reliability of the inferred somatic mutations, it is suggested to observe the Bam file $outdir/*realign.bam using IGV or samtools tview. However, by default, SpecHLA will delete the bam files. So, to maintain the bam file, we need to delete the statement bash $dir/../clear_output.sh $outdir/ in the 353-th line of SpecHLA.sh. The reference file is $db/ref/hla.ref.extend.fa.

I believe it would be useful to include an HLA mutation detection pipeline. I plan to add a script to identify the sample-specific variants given two samples. I think this may be useful. Do you have any suggestions?

best, shuai

LongpanUPC commented 5 months ago

Hi shuai, fully agree. Then only using SpecHLA, one tools to get all the HLA infomation (genotype, mutations and LOH) is wonderfully, especially for cancer genomics. Taking into account the effect of coverage, it will be better to also report the total depth both in two samples (i.e, tumor and normal) to allow us to know if this variant is true mutation rather than a spurious mutation due to lower coverage in one of the samples. Besides, may be also include some qulity score or features if possible. By the way, I would like to try once you develope this script. thank you! Long

LongpanUPC commented 5 months ago

Hi Shuai, I kindly ask the progress of script to detect the somatic mutations between two samples. Hope it will not bother you. Before your code is released, I also did some try according to your suggestion. I directly used mutect2 to detect somatic HLA mutations. The code was as followed: gatk Mutect2 \ -R $db/ref/hla.ref.extend.fa \ -I /pathtoresults/XXXtumor.realign.sort.bam \ -I /pathtoresults/XXXnormal.realign.sort.bam \ --normal-sample XXXnormal \ -O /pathtoresults/HLA.somatic.vcf.gz And it works. We found some mutations. However, now I don't know how to annotate these variants since they use custome reference genome. My aims to map these variants to hg38 ref genome and get standard protein_change. something like that. Do you have some suggestions? or maybe waiting for your next release?

thank you! Long

wshuai294 commented 4 months ago

Hello,

Sorry for not getting back to you sooner because of the Lunar New Year holiday. Your script seems very good for somatic mutation detection. As for converting the locus position, we can use the method CrossMap. CrossMap reply on the chain file, which can be obtained using the method chainNet and axtChain based on the alignment of $db/ref/hla.ref.extend.fa and hg38.

An alternative solution is to extract the surrounding sequence (e.g. 50 bp) of each somatic mutation and map the sequence to the hg38 using Minimap2.

Please let me know if it works.

best, shuai