dpuiu / MitoHPC

MIT License
10 stars 12 forks source link

Is it necessary to extract ChrM first #2

Open Captain-Pam opened 1 year ago

Captain-Pam commented 1 year ago

Hi, Thank you for your tools. I am trying to make it work. I have encountered some problems. 1) There are many reference files in RefSeq . I installed HP_RNAME=hg38 for the first time later, and HP_RNAME=hg19 later. So do the two reference genome versions HP_RNAME=hg19 and HP_RNAME=hg38 differ only by a few files NUMT and chrM in the RefSeq? 2) My bam file is quite large, do I need to extract the telomeric bam file before running the program? It seems impossible, because the calculation of mtDNA-CN requires bam files for all autosomes.

dpuiu commented 1 year ago

Hello Captain-Pam and sorry about the delay in getting back to you.

  1. If you are planning to use hg19 instead of hg38, just copy over and edit the "init.sh" file. Comment the ~6 lines and #hs38DH(default) and uncomment the ones under #hg19

  2. Do you mean telomeric or mitochondrial? If you want to work with smaller bam files, you can prefilter the large ones, however for mtDNA-CN computation your have to use the idxstats corresponding to the original alignments ( which contain the read counts for all the chromosomes)

$ mkdir new_dir $ samtools view file.bam chrM chr1:564465-569708 chr17:22020692-22020827 -b ... > new_dir/file.bam # (hg19) $ samtools index new_dir/file.bam $ samtools idxstats file.bam > new_dir/file.idxstats

Captain-Pam commented 1 year ago

Hey, Thank your for your reply. I have got it working. I also compared software 'fastMitoCalc' to calculate mtDNA-CN, which was used in your paper. But I found the mtDNA-CN using MitoHPC was more copy numbers than that using fastMitoCalc. The biggest gap can reach 2600 and I checked the 'idxstats' file and recalculated the mtDNA-CN using the formula in the paper and the mtDNA-CN was the same as 'M' in the count.tab (M column). 1) Is this mtDNA-CN gap between 'MitoHPC' and 'fastMitoCalc' caused by the fact that fastMitoCalc is a random sampling of a certain number of regions to calculate mtDNA-CN or just using chromosome 1-22? Here are my two results from MitoHPC (mtDNA-CN: 3366) and fastMitoCalc (mtDNA-CN: 693.719) MitoHPC : chrM 16571 13458639 44004 chr1 249250621 118214482 314343 chr2 243199373 117972464 427951 chr3 198022430 85924989 269176 chr4 191154276 98803920 267515 chr5 180915260 77116047 222679 chr6 171115067 74220704 212901 chr7 159138663 76211898 219534 chr8 146364022 67936630 217216 chr9 141213431 55903272 159052 chr10 135534747 100631038 233961 chr11 135006516 60413722 174248 chr12 133851895 59082673 167006 chr13 115169878 39987143 116357 chr14 107349540 41120393 122057 chr15 102531392 36480936 102307 chr16 90354753 47841347 117311 chr17 81195210 41463621 106312 chr18 78077248 37650165 103872 chr19 59128983 34102600 82349 chr20 63025520 26468498 75683 chr21 48129895 18105490 54604 chr22 51304566 15652517 41953 chrX 155270560 71478267 188991 chrY 59373566 8816552 34535 chr1_gl000191_random 106433 35277 82 chr1_gl000192_random 547496 229270 679 chr4_ctg9_hap1 590426 119302 399 chr4_gl000193_random 189789 174745 574 chr4_gl000194_random 191469 139145 415 chr6_apd_hap1 4622290 128934 385 chr6_cox_hap2 4795371 384416 1016 chr6_dbb_hap3 4610396 285520 793 chr6_mann_hap4 4683263 257711 812 chr6_mcf_hap5 4833398 286283 745 chr6_qbl_hap6 4611984 263381 805 chr6_ssto_hap7 4928567 294652 814 chr7_gl000195_random 182896 266561 805 chr8_gl000196_random 38914 13476 33 chr8_gl000197_random 37175 5782 23 chr9_gl000198_random 90085 212405 700 chr9_gl000199_random 169874 10107120 15130 chr9_gl000200_random 187035 30469 91 chr9_gl000201_random 36148 3949 13 chr11_gl000202_random 40103 9874 26 chr17_ctg5_hap1 1680828 242877 775 chr17_gl000203_random 37498 19638 44 chr17_gl000204_random 81310 19130 85 chr17_gl000205_random 174588 231417 653 chr17_gl000206_random 41001 4928 16 chr18_gl000207_random 4262 3083 49 chr19_gl000208_random 92689 426440 1108 chr19_gl000209_random 159169 35332 87 chr21_gl000210_random 27682 3730 14 chrUn_gl000211 166566 93986 270 chrUn_gl000212 186858 196808 957 chrUn_gl000213 164239 39830 95 chrUn_gl000214 137718 592308 1312 chrUn_gl000215 172545 26726 93 chrUn_gl000216 172294 2987761 6992 chrUn_gl000217 172149 122195 384 chrUn_gl000218 161147 239626 549 chrUn_gl000219 179198 185568 1029 chrUn_gl000220 161802 5933799 13490 chrUn_gl000221 155397 186931 492 chrUn_gl000222 186861 105965 349 chrUn_gl000223 180455 49609 93 chrUn_gl000224 179693 686448 1711 chrUn_gl000225 211173 1168363 4173 chrUn_gl000226 15008 5528413 7990 chrUn_gl000227 128374 33551 101 chrUn_gl000228 129120 384732 754 chrUn_gl000229 19913 47762 148 chrUn_gl000230 43691 31270 115 chrUn_gl000231 27386 51359 149 chrUn_gl000232 40652 97229 295 chrUn_gl000233 45941 37658 88 chrUn_gl000234 40531 85785 243 chrUn_gl000235 34474 52575 159 chrUn_gl000236 41934 15104 47 chrUn_gl000237 45867 99997 231 chrUn_gl000238 39939 8884 24 chrUn_gl000239 33824 58609 128 chrUn_gl000240 41933 26847 75 chrUn_gl000241 42152 97210 294 chrUn_gl000242 43523 9432 25 chrUn_gl000243 43341 41654 106 chrUn_gl000244 39929 18093 65 chrUn_gl000245 36651 41757 123 chrUn_gl000246 38154 12727 32 chrUn_gl000247 36422 23342 42 chrUn_gl000248 39786 8494 21 chrUn_gl000249 38502 7260 14

fastMitoCalc: mt_copy_number_avg: 693.719 mt_coverage: 7916.24 autosomal_coverage: 22.8226 actual basepairs covered by reads: 3000000 chrom_used_for_autosomal_coverage: 1-22

2) I found that the chrM in the 'idxstats' file is not 16569 but 16571. I set it to HP_MT=chrM, HP_MTLEN=16569 in 'init.sh'. I don't know why it became 16571 here. On the other hand, HP_MT=chrM and HP_MT= rCRS (or RSRS) are different reference sequences, the article says rCRS is used, but the code is HP_MT=chrM.

export HP_MT=chrM # chrM, rCRS or RSRS, FASTA file available under $HP_RDIR

3) chrM reference sequence (rCRS or RSRS) were from a European individual. Can other populations (e.g., Asian populations) use the chrM reference sequence above (rCRS or RSRS) in haplogroup and mitochondrial variants analysis?

I look forward to hearing from you!