Closed alisamatisse closed 5 months ago
Hello, thanks for considering our method. We have provided the conda environment file which can be used to construct the env. We assume it is suitable for Linux systems. Is it not applicable to Guix? What error did you meet in running it?
It is not compatible with the how our cluster paths are organized, but I guess there is not much that can be done from your side. Could it be executed independently from the conda environment? Really would like to try using your tool. Maybe will run it locally..
Hello. I think it could be possible to use it without the conda environment. However, you should install the dependencies yourself. What functions do you want to use? If you only want to handle long-read data, there will be no need to install it. You can directly use the Python script. If you handle Illumina reads, it will be more complex. But I think it is also possible.
Hello Shuai, that it is very promising, thank you. I am now trying to make it work...
Terribly sorry for all the stupid questions, I tried to run the demo files
python3 script/long_read_typing.py -j 5 -n HG00733 -r HG00733_pacbio.subsample.fastq.gz -o output
but have this
[M::mm_idx_gen::0.568*0.84] collected minimizers
[M::mm_idx_gen::0.710*1.06] sorted minimizers
[M::main::0.713*1.06] loaded/built the index for 1591 target sequence(s)
[M::mm_mapopt_update::0.714*1.06] mid_occ = 936
[M::mm_idx_stat] kmer size: 19; skip: 10; is_hpc: 1; #seq: 1591
[M::mm_idx_stat::0.715*1.06] distinct minimizers: 49986 (20.90% are singletons); average occurrences: 39.684; average spacing: 7.725; total length: 15322978
[M::worker_pipeline::462.895*2.00] mapped 851 sequences
[M::main] Version: 2.24-r1122
[M::main] CMD: minimap2 -t 5 -x map-pb -p 0.1 -N 100000 -a /fast/AG_Lee/Alisa/HLA/PacBioSeq_2024/20240325mail_SEQS/SpecHLA/script/../db/ref/hla_gen.format.filter.extend.DRB.no26789.fasta HG00733_pacbio.subsample.fastq.gz
[M::main] Real time: 462.901 sec; CPU: 923.485 sec; Peak RSS: 0.938 GB
alignment done.
reads-binning done.
[M::mm_idx_gen::0.015*0.80] collected minimizers
[M::mm_idx_gen::0.017*0.88] sorted minimizers
[M::main::0.017*0.88] loaded/built the index for 1 target sequence(s)
[M::mm_mapopt_update::0.018*0.88] mid_occ = 10
[M::mm_idx_stat] kmer size: 19; skip: 10; is_hpc: 1; #seq: 1
[M::mm_idx_stat::0.018*0.88] distinct minimizers: 706 (100.00% are singletons); average occurrences: 1.000; average spacing: 7.795; total length: 5503
[M::worker_pipeline::0.034*1.15] mapped 6 sequences
[M::main] Version: 2.24-r1122
[M::main] CMD: minimap2 -t 5 -x map-pb -a /fast/AG_Lee/Alisa/HLA/PacBioSeq_2024/20240325mail_SEQS/SpecHLA/script/../db//HLA/HLA_A/HLA_A.fa output/HG00733//A.long_read.fq.gz
[M::main] Real time: 0.036 sec; CPU: 0.040 sec; Peak RSS: 0.011 GB
File "/fast/AG_Lee/Alisa/HLA/PacBioSeq_2024/20240325mail_SEQS/SpecHLA/script/mask_low_depth_region.py", line 113
print (gene, mask[0]-1, mask[1]-1, file = f)
^
SyntaxError: invalid syntax
cp: cannot stat ‘output/HG00733//low_depth.bed’: No such file or directory
Sorry again. I will understand if you decide not to reply, haha.
Hello, cause it uses python
instead of python3
in the 215-th line. I have edited this. Please update to the latest commit. I believe it will work now.
Hi, It seems to work well except for the annotation step. Is the vcf files empty? Is there a sequence in hla.allele..HLA_.fasta? If both not, I guess the reason is the database. Have you run the bash index.sh? You can comment the software installation in index.sh, and run it to construct the database only. Or you can run it on a local PC, and upload the database to the server. Please tell me if you have further issue.
Best, Shuai
On Fri, Jun 14, 2024 at 9:24 PM Alisa @.***> wrote:
Hi Shuai,
I keep trying to make the tool work! Sorry for the dumb questions again. I moved now to Ubuntu virtual machine where I can install all the dependencies myself. I thought maybe I am missing something.. but not so obvious what... I have empty output files in the end. The code output:
`(base) @.**:~/SpecHLA/script$ python3 long_read_typing.py -r /fast/AG_Lee/Alisa/HLA/0000004436/outputs/fastx_files/m64316_240319_133624.bc2073--bc2073.hifi_reads.fastq.gz -n D NA01 -o /home/aiakupo/SpecHLA/output [M::mm_idx_gen::0.466 1.00] collected minimizers [M::mm_idx_gen::0.5161.38] sorted minimizers [M::main::0.516 1.38] loaded/built the index for 1591 target sequence(s) [M::mm_mapopt_update::0.5191.38] mid_occ = 936 [M::mm_idx_stat] kmer size: 19; skip: 10; is_hpc: 1; #seq: 1591 [M::mm_idx_stat::0.520 1.38] distinct minimizers: 49986 (20.90% are singletons); average occurrences: 39.684; average spacing: 7.725; total length: 15322978 [M::worker_pipeline::2764.5174.99] mapped 87302 sequences [M::worker_pipeline::5416.019 5.00] mapped 90496 sequences [M::worker_pipeline::6441.3304.98] mapped 34247 sequences [M::main] Version: 2.22-r1101 [M::main] CMD: minimap2 -t 5 -x map-pb -p 0.1 -N 100000 -a /home/aiakupo/SpecHLA/script/../db//ref/hla_gen.format.filter.extend.DRB.no26789.fasta /fast/AG_Lee/Alisa/HLA/0000004436/outputs/fa stx_files/m64316_240319_133624.bc2073--bc2073.hifi_reads.fastq.gz [M::main] Real time: 6441.414 sec; CPU: 32064.015 sec; Peak RSS: 4.391 GB alignment done. reads-binning done. [M::mm_idx_gen::0.002 1.63] collected minimizers [M::mm_idx_gen::0.0032.80] sorted minimizers [M::main::0.003 2.79] loaded/built the index for 1 target sequence(s) [M::mm_mapopt_update::0.0042.70] mid_occ = 10 [M::mm_idx_stat] kmer size: 19; skip: 10; is_hpc: 1; #seq: 1 [M::mm_idx_stat::0.0042.66] distinct minimizers: 706 (100.00% are singletons); average occurrences: 1.000; average spacing: 7.795; total length: 5503 [M::main] Version: 2.22-r1101 [M::main] CMD: minimap2 -t 5 -x map-pb -a /home/aiakupo/SpecHLA/script/../db//HLA/HLA_A/HLA_A.fa /home/aiakupo/SpecHLA/output/DNA01//A.long_read.fq.gz [M::main] Real time: 0.004 sec; CPU: 0.010 sec; Peak RSS: 0.003 GB downsample ratio is 1 for A Mean depth {'HLA_A': 0}
2024-06-14 13:19:48 Min read coverage set to 2. 2024-06-14 13:19:48 Max read coverage set to 100000. 2024-06-14 13:19:48 Estimating alignment parameters... 2024-06-14 13:19:48 Done estimating alignment parameters.
Transition Probabilities: match -> match: 0.333 match -> insertion: 0.333 match -> deletion: 0.333 deletion -> match: 0.500 deletion -> deletion: 0.500 insertion -> match: 0.500 insertion -> insertion: 0.500 Emission Probabilities: match (equal): 0.500 match (not equal): 0.167 insertion: 1.000 deletion: 1.000
2024-06-14 13:19:48 Calling potential SNVs using pileup... 2024-06-14 13:19:48 0 potential variants identified. No candidate variants identified, printing empty VCF file... [M::mm_idx_gen::0.002 1.78] collected minimizers [M::mm_idx_gen::0.0033.24] sorted minimizers [M::main::0.003 3.22] loaded/built the index for 1 target sequence(s) [M::mm_mapopt_update::0.0033.11] mid_occ = 10 [M::mm_idx_stat] kmer size: 19; skip: 10; is_hpc: 1; #seq: 1 [M::mm_idx_stat::0.003*3.05] distinct minimizers: 788 (100.00% are singletons); average occurrences: 1.000; average spacing: 7.717; total length: 6081 [M::main] Version: 2.22-r1101 [M::main] CMD: minimap2 -t 5 -x map-pb -a /home/aiakupo/SpecHLA/script/../db//HLA/HLA_B/HLA_B.fa /home/aiakupo/SpecHLA/output/DNA01//B.long_read.fq.gz [M::main] Real time: 0.004 sec; CPU: 0.011 sec; Peak RSS: 0.003 GB downsample ratio is 1 for B Mean depth {'HLA_B': 0}
2024-06-14 13:19:48 Min read coverage set to 2. 2024-06-14 13:19:48 Max read coverage set to 100000. 2024-06-14 13:19:48 Estimating alignment parameters... 2024-06-14 13:19:48 Done estimating alignment parameters.
Transition Probabilities: match -> match: 0.333 match -> insertion: 0.333 match -> deletion: 0.333 deletion -> match: 0.500 deletion -> deletion: 0.500 insertion -> match: 0.500 insertion -> insertion: 0.500 Emission Probabilities: match (equal): 0.500 match (not equal): 0.167 insertion: 1.000 deletion: 1.000
2024-06-14 13:19:48 Calling potential SNVs using pileup... 2024-06-14 13:19:48 0 potential variants identified. No candidate variants identified, printing empty VCF file... [M::mm_idx_gen::0.002 1.79] collected minimizers [M::mm_idx_gen::0.0033.12] sorted minimizers [M::main::0.003 3.10] loaded/built the index for 1 target sequence(s) [M::mm_mapopt_update::0.0033.00] mid_occ = 10 [M::mm_idx_stat] kmer size: 19; skip: 10; is_hpc: 1; #seq: 1 [M::mm_idx_stat::0.003*2.94] distinct minimizers: 826 (100.00% are singletons); average occurrences: 1.000; average spacing: 7.632; total length: 6304 [M::main] Version: 2.22-r1101 [M::main] CMD: minimap2 -t 5 -x map-pb -a /home/aiakupo/SpecHLA/script/../db//HLA/HLA_C/HLA_C.fa /home/aiakupo/SpecHLA/output/DNA01//C.long_read.fq.gz [M::main] Real time: 0.004 sec; CPU: 0.010 sec; Peak RSS: 0.003 GB downsample ratio is 1 for C Mean depth {'HLA_C': 0}
2024-06-14 13:19:48 Min read coverage set to 2. 2024-06-14 13:19:48 Max read coverage set to 100000. 2024-06-14 13:19:48 Estimating alignment parameters... 2024-06-14 13:19:48 Done estimating alignment parameters.
Transition Probabilities: match -> match: 0.333 match -> insertion: 0.333 match -> deletion: 0.333 deletion -> match: 0.500 deletion -> deletion: 0.500 insertion -> match: 0.500 insertion -> insertion: 0.500 Emission Probabilities: match (equal): 0.500 match (not equal): 0.167 insertion: 1.000 deletion: 1.000
2024-06-14 13:19:48 Calling potential SNVs using pileup... 2024-06-14 13:19:48 0 potential variants identified. No candidate variants identified, printing empty VCF file... [M::mm_idx_gen::0.002 1.62] collected minimizers [M::mm_idx_gen::0.0032.60] sorted minimizers [M::main::0.003 2.58] loaded/built the index for 1 target sequence(s) [M::mm_mapopt_update::0.0032.49] mid_occ = 10 [M::mm_idx_stat] kmer size: 19; skip: 10; is_hpc: 1; #seq: 1 [M::mm_idx_stat::0.003 2.44] distinct minimizers: 1488 (100.00% are singletons); average occurrences: 1.000; average spacing: 7.913; total length: 11775 [M::worker_pipeline::0.2672.61] mapped 231 sequences [M::main] Version: 2.22-r1101 [M::main] CMD: minimap2 -t 5 -x map-pb -a /home/aiakupo/SpecHLA/script/../db//HLA/HLA_DPA1/HLA_DPA1.fa /home/aiakupo/SpecHLA/output/DNA01//DPA1.long_read.fq.gz [M::main] Real time: 0.268 sec; CPU: 0.699 sec; Peak RSS: 0.071 GB downsample ratio is 1 for DPA1 Mean depth {'HLA_DPA1': 0}
2024-06-14 13:19:49 Min read coverage set to 2. 2024-06-14 13:19:49 Max read coverage set to 100000. 2024-06-14 13:19:49 Estimating alignment parameters... 2024-06-14 13:19:49 Done estimating alignment parameters.
Transition Probabilities: match -> match: 0.333 match -> insertion: 0.333 match -> deletion: 0.333 deletion -> match: 0.500 deletion -> deletion: 0.500 insertion -> match: 0.500 insertion -> insertion: 0.500 Emission Probabilities: match (equal): 0.500 match (not equal): 0.167 insertion: 1.000 deletion: 1.000
2024-06-14 13:19:49 Calling potential SNVs using pileup... 2024-06-14 13:19:49 0 potential variants identified. No candidate variants identified, printing empty VCF file... [M::mm_idx_gen::0.002 1.63] collected minimizers [M::mm_idx_gen::0.0032.96] sorted minimizers [M::main::0.003 2.94] loaded/built the index for 1 target sequence(s) [M::mm_mapopt_update::0.0042.83] mid_occ = 10 [M::mm_idx_stat] kmer size: 19; skip: 10; is_hpc: 1; #seq: 1 [M::mm_idx_stat::0.004 2.76] distinct minimizers: 1730 (99.94% are singletons); average occurrences: 1.001; average spacing: 7.780; total length: 13468 [M::worker_pipeline::7.1402.84] mapped 5835 sequences [M::main] Version: 2.22-r1101 [M::main] CMD: minimap2 -t 5 -x map-pb -a /home/aiakupo/SpecHLA/script/../db//HLA/HLA_DPB1/HLA_DPB1.fa /home/aiakupo/SpecHLA/output/DNA01//DPB1.long_read.fq.gz [M::main] Real time: 7.141 sec; CPU: 20.293 sec; Peak RSS: 0.147 GB downsample ratio is 1 for DPB1 Mean depth {'HLA_DPB1': 182}
2024-06-14 13:20:00 Min read coverage set to 2. 2024-06-14 13:20:00 Max read coverage set to 100000. 2024-06-14 13:20:00 Estimating alignment parameters... 2024-06-14 13:20:00 Done estimating alignment parameters.
Transition Probabilities: match -> match: 0.990 match -> insertion: 0.006 match -> deletion: 0.004 deletion -> match: 0.672 deletion -> deletion: 0.328 insertion -> match: 0.763 insertion -> insertion: 0.237 Emission Probabilities: match (equal): 0.925 match (not equal): 0.025 insertion: 1.000 deletion: 1.000
2024-06-14 13:20:00 Calling potential SNVs using pileup... 2024-06-14 13:20:00 123 potential variants identified. 2024-06-14 13:20:00 Generating haplotype fragments from reads... 2024-06-14 13:20:05 10% of variants processed... 2024-06-14 13:20:05 20% of variants processed... 2024-06-14 13:20:05 30% of variants processed... 2024-06-14 13:20:06 40% of variants processed... 2024-06-14 13:20:06 100% of variants processed. 2024-06-14 13:20:06 Calling initial genotypes using pair-HMM realignment... 2024-06-14 13:20:06 Iteratively assembling haplotypes and refining genotypes... 2024-06-14 13:20:06 Round 1 of haplotype assembly... 2024-06-14 13:20:06 (Before HapCUT2) Total phased heterozygous SNVs: 61 Total likelihood (phred): 1012374.57 2024-06-14 13:20:08 (After HapCUT2) Total phased heterozygous SNVs: 61 Total likelihood (phred): 322425.24 2024-06-14 13:20:08 (After Greedy) Total phased heterozygous SNVs: 61 Total likelihood (phred): 291216.51 2024-06-14 13:20:08 Round 2 of haplotype assembly... 2024-06-14 13:20:08 (Before HapCUT2) Total phased heterozygous SNVs: 46 Total likelihood (phred): 291216.51 2024-06-14 13:20:09 (After HapCUT2) Total phased heterozygous SNVs: 46 Total likelihood (phred): 291900.35 2024-06-14 13:20:09 (After Greedy) Total phased heterozygous SNVs: 46 Total likelihood (phred): 291216.51 2024-06-14 13:20:09 Printing VCF file... [M::mm_idx_gen::0.002 1.83] collected minimizers [M::mm_idx_gen::0.0033.03] sorted minimizers [M::main::0.003 3.02] loaded/built the index for 1 target sequence(s) [M::mm_mapopt_update::0.0032.88] mid_occ = 10 [M::mm_idx_stat] kmer size: 19; skip: 10; is_hpc: 1; #seq: 1 [M::mm_idx_stat::0.003 2.82] distinct minimizers: 1059 (99.91% are singletons); average occurrences: 1.008; average spacing: 7.959; total length: 8492 [M::worker_pipeline::6.3873.58] mapped 3343 sequences [M::main] Version: 2.22-r1101 [M::main] CMD: minimap2 -t 5 -x map-pb -a /home/aiakupo/SpecHLA/script/../db//HLA/HLA_DQA1/HLA_DQA1.fa /home/aiakupo/SpecHLA/output/DNA01//DQA1.long_read.fq.gz [M::main] Real time: 6.387 sec; CPU: 22.863 sec; Peak RSS: 0.153 GB downsample ratio is 1 for DQA1 Mean depth {'HLA_DQA1': 88}
2024-06-14 13:20:18 Min read coverage set to 2. 2024-06-14 13:20:18 Max read coverage set to 100000. 2024-06-14 13:20:18 Estimating alignment parameters... 2024-06-14 13:20:18 Done estimating alignment parameters.
Transition Probabilities: match -> match: 0.991 match -> insertion: 0.005 match -> deletion: 0.004 deletion -> match: 0.514 deletion -> deletion: 0.486 insertion -> match: 0.556 insertion -> insertion: 0.444 Emission Probabilities: match (equal): 0.975 match (not equal): 0.008 insertion: 1.000 deletion: 1.000
2024-06-14 13:20:18 Calling potential SNVs using pileup... 2024-06-14 13:20:18 183 potential variants identified. 2024-06-14 13:20:18 Generating haplotype fragments from reads... 2024-06-14 13:20:18 10% of variants processed... 2024-06-14 13:20:18 20% of variants processed... 2024-06-14 13:20:18 30% of variants processed... 2024-06-14 13:20:18 40% of variants processed... 2024-06-14 13:20:18 50% of variants processed... 2024-06-14 13:20:18 60% of variants processed... 2024-06-14 13:20:18 70% of variants processed... 2024-06-14 13:20:18 80% of variants processed... 2024-06-14 13:20:18 90% of variants processed... 2024-06-14 13:20:18 100% of variants processed. 2024-06-14 13:20:18 Calling initial genotypes using pair-HMM realignment... 2024-06-14 13:20:18 Iteratively assembling haplotypes and refining genotypes... 2024-06-14 13:20:18 Round 1 of haplotype assembly... 2024-06-14 13:20:18 (Before HapCUT2) Total phased heterozygous SNVs: 123 Total likelihood (phred): 92420.53 2024-06-14 13:20:19 (After HapCUT2) Total phased heterozygous SNVs: 123 Total likelihood (phred): 38571.91 2024-06-14 13:20:19 (After Greedy) Total phased heterozygous SNVs: 123 Total likelihood (phred): 34372.78 2024-06-14 13:20:19 Round 2 of haplotype assembly... 2024-06-14 13:20:19 (Before HapCUT2) Total phased heterozygous SNVs: 99 Total likelihood (phred): 34372.78 2024-06-14 13:20:20 (After HapCUT2) Total phased heterozygous SNVs: 99 Total likelihood (phred): 34372.78 2024-06-14 13:20:20 (After Greedy) Total phased heterozygous SNVs: 99 Total likelihood (phred): 34372.78 2024-06-14 13:20:20 Printing VCF file... [M::mm_idx_gen::0.002 1.73] collected minimizers [M::mm_idx_gen::0.0032.89] sorted minimizers [M::main::0.003 2.87] loaded/built the index for 1 target sequence(s) [M::mm_mapopt_update::0.0032.74] mid_occ = 10 [M::mm_idx_stat] kmer size: 19; skip: 10; is_hpc: 1; #seq: 1 [M::mm_idx_stat::0.003 2.69] distinct minimizers: 1216 (99.84% are singletons); average occurrences: 1.004; average spacing: 7.764; total length: 9480 [M::worker_pipeline::0.8842.89] mapped 668 sequences [M::main] Version: 2.22-r1101 [M::main] CMD: minimap2 -t 5 -x map-pb -a /home/aiakupo/SpecHLA/script/../db//HLA/HLA_DQB1/HLA_DQB1.fa /home/aiakupo/SpecHLA/output/DNA01//DQB1.long_read.fq.gz [M::main] Real time: 0.884 sec; CPU: 2.556 sec; Peak RSS: 0.096 GB downsample ratio is 1 for DQB1 Mean depth {'HLA_DQB1': 95}
2024-06-14 13:20:21 Min read coverage set to 2. 2024-06-14 13:20:21 Max read coverage set to 100000. 2024-06-14 13:20:21 Estimating alignment parameters... 2024-06-14 13:20:21 Done estimating alignment parameters.
Transition Probabilities: match -> match: 0.993 match -> insertion: 0.003 match -> deletion: 0.004 deletion -> match: 0.406 deletion -> deletion: 0.594 insertion -> match: 0.297 insertion -> insertion: 0.703 Emission Probabilities: match (equal): 0.950 match (not equal): 0.017 insertion: 1.000 deletion: 1.000
2024-06-14 13:20:21 Calling potential SNVs using pileup... 2024-06-14 13:20:21 547 potential variants identified. 2024-06-14 13:20:21 Generating haplotype fragments from reads... 2024-06-14 13:20:21 10% of variants processed... 2024-06-14 13:20:22 20% of variants processed... 2024-06-14 13:20:22 30% of variants processed... 2024-06-14 13:20:22 40% of variants processed... 2024-06-14 13:20:22 50% of variants processed... 2024-06-14 13:20:22 60% of variants processed... 2024-06-14 13:20:22 70% of variants processed... 2024-06-14 13:20:22 80% of variants processed... 2024-06-14 13:20:22 90% of variants processed... 2024-06-14 13:20:22 100% of variants processed. 2024-06-14 13:20:22 Calling initial genotypes using pair-HMM realignment... 2024-06-14 13:20:22 Iteratively assembling haplotypes and refining genotypes... 2024-06-14 13:20:22 Round 1 of haplotype assembly... 2024-06-14 13:20:22 (Before HapCUT2) Total phased heterozygous SNVs: 23 Total likelihood (phred): 757167.63 2024-06-14 13:20:22 (After HapCUT2) Total phased heterozygous SNVs: 23 Total likelihood (phred): 36533.11 2024-06-14 13:20:22 (After Greedy) Total phased heterozygous SNVs: 23 Total likelihood (phred): 35002.70 2024-06-14 13:20:22 Round 2 of haplotype assembly... 2024-06-14 13:20:22 (Before HapCUT2) Total phased heterozygous SNVs: 5 Total likelihood (phred): 35002.70 2024-06-14 13:20:22 (After HapCUT2) Total phased heterozygous SNVs: 5 Total likelihood (phred): 35002.70 2024-06-14 13:20:22 (After Greedy) Total phased heterozygous SNVs: 5 Total likelihood (phred): 35002.70 2024-06-14 13:20:22 Printing VCF file... [M::mm_idx_gen::0.002 1.00] collected minimizers [M::mm_idx_gen::0.0032.78] sorted minimizers [M::main::0.003 2.76] loaded/built the index for 1 target sequence(s) [M::mm_mapopt_update::0.0042.58] mid_occ = 10 [M::mm_idx_stat] kmer size: 19; skip: 10; is_hpc: 1; #seq: 1 [M::mm_idx_stat::0.004 2.53] distinct minimizers: 1692 (99.82% are singletons); average occurrences: 1.004; average spacing: 7.786; total length: 13229 [M::worker_pipeline::5.7420.83] mapped 9477 sequences [M::main] Version: 2.22-r1101 [M::main] CMD: minimap2 -t 5 -x map-pb -a /home/aiakupo/SpecHLA/script/../db//HLA/HLA_DRB1/HLA_DRB1.fa /home/aiakupo/SpecHLA/output/DNA01//DRB1.long_read.fq.gz [M::main] Real time: 5.742 sec; CPU: 4.793 sec; Peak RSS: 0.245 GB downsample ratio is 1 for DRB1 Mean depth {'HLA_DRB1': 45}
2024-06-14 13:20:33 Min read coverage set to 2. 2024-06-14 13:20:33 Max read coverage set to 100000. 2024-06-14 13:20:33 Estimating alignment parameters... 2024-06-14 13:20:33 Done estimating alignment parameters.
Transition Probabilities: match -> match: 0.995 match -> insertion: 0.003 match -> deletion: 0.002 deletion -> match: 0.625 deletion -> deletion: 0.375 insertion -> match: 0.577 insertion -> insertion: 0.423 Emission Probabilities: match (equal): 0.972 match (not equal): 0.009 insertion: 1.000 deletion: 1.000
2024-06-14 13:20:33 Calling potential SNVs using pileup... 2024-06-14 13:20:33 592 potential variants identified. 2024-06-14 13:20:33 Generating haplotype fragments from reads... 2024-06-14 13:20:34 10% of variants processed... 2024-06-14 13:20:34 20% of variants processed... 2024-06-14 13:20:34 30% of variants processed... 2024-06-14 13:20:34 40% of variants processed... 2024-06-14 13:20:34 50% of variants processed... 2024-06-14 13:20:34 60% of variants processed... 2024-06-14 13:20:34 70% of variants processed... 2024-06-14 13:20:34 80% of variants processed... 2024-06-14 13:20:34 90% of variants processed... 2024-06-14 13:20:34 100% of variants processed. 2024-06-14 13:20:34 Calling initial genotypes using pair-HMM realignment... 2024-06-14 13:20:34 Iteratively assembling haplotypes and refining genotypes... 2024-06-14 13:20:34 Round 1 of haplotype assembly... 2024-06-14 13:20:34 (Before HapCUT2) Total phased heterozygous SNVs: 216 Total likelihood (phred): 224750.42 2024-06-14 13:20:35 (After HapCUT2) Total phased heterozygous SNVs: 216 Total likelihood (phred): 61012.72 2024-06-14 13:20:35 (After Greedy) Total phased heterozygous SNVs: 216 Total likelihood (phred): 44938.21 2024-06-14 13:20:35 Round 2 of haplotype assembly... 2024-06-14 13:20:35 (Before HapCUT2) Total phased heterozygous SNVs: 327 Total likelihood (phred): 44938.21 2024-06-14 13:20:38 (After HapCUT2) Total phased heterozygous SNVs: 327 Total likelihood (phred): 47655.22 2024-06-14 13:20:38 (After Greedy) Total phased heterozygous SNVs: 327 Total likelihood (phred): 44208.28 2024-06-14 13:20:38 Printing VCF file...
Sequence is reconstructed, start annotation... parameter: sample:DNA01 dir:/home/aiakupo/SpecHLA/output/DNA01/ pop:Unknown wxs:tgs G_nom:0 No such file or directory version: IPD-IMGT/HLA 3.38.0
Sample HLA_A_1 HLA_A_2 HLA_B_1 HLA_B_2 HLA_C_1 HLA_C_2 HLA_DPA1_1 HLA_DPA1_2 HLA_DPB1_1 HLA_DPB1_2 HLA_DQA1_1 HLA_DQA1_2 HLA_DQB1_1 HLA_DQB1_2 HLA_DR B1_1 HLA_DRB1_2 optimize typing results by balancing alignment length and identity The refined typing results is in /home/aiakupo/SpecHLA/output/DNA01//hlala.like.results.txt Finished. `
The part where it breaks (I think) is highlighted in bold
— Reply to this email directly, view it on GitHub https://github.com/deepomicslab/SpecHLA/issues/29#issuecomment-2168037236, or unsubscribe https://github.com/notifications/unsubscribe-auth/ALS7DWSMLQH27II4EVB46FTZHLVHFAVCNFSM6AAAAABHQGSAT6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNRYGAZTOMRTGY . You are receiving this because you commented.Message ID: @.***>
Hi Shuai,
both vcf and hla.allele..HLA_.fasta contain information, so these steps seemed to work.. I understand why it doesn't work for HLA I: we did enrichment for a certain region of MHC II region only. Well, I thought alleles could be determined there at least but still the final output file is empty somehow :(
(base) aiakupo@sl-sci-lee1:~/SpecHLA/output/DNA01$ cat hlala.like.results.txt
Locus Chromosome Allele
A 1
A 2
B 1
B 2
C 1
C 2
DPA1 1
DPA1 2
DPB1 1
DPB1 2
DQA1 1
DQA1 2
DQB1 1
DQB1 2
DRB1 1
DRB1 2
(base) aiakupo@sl-sci-lee1:~/SpecHLA/output/DNA01$ cat hla.result.
hla.result.details.txt hla.result.txt
(base) aiakupo@sl-sci-lee1:~/SpecHLA/output/DNA01$ cat hla.result.txt
# version: IPD-IMGT/HLA 3.38.0
Sample HLA_A_1 HLA_A_2 HLA_B_1 HLA_B_2 HLA_C_1 HLA_C_2 HLA_DPA1_1 HLA_DPA1_2 HLA_DPB1_1 HLA_DPB1_2 HLA_DQA1_1 HLA_DQA1_2 HLA_DQB1_1 HLA_DQB1_2 HLA_DRB1_1 HLA_DRB1_2
(base) aiakupo@sl-sci-lee1:~/SpecHLA/output/DNA01$ cat hla.result.details.txt
# version: IPD-IMGT/HLA 3.38.0
Gene G_best allele details:allele;Score;Caucasian;Black;Asian
aiakupo@sl-sci-lee1:~/SpecHLA/script$ python3 long_read_typing.py -r /fast/AG_Lee/Alisa/HLA/0000004436/outputs/fastx_files/m64316_240319_133624.bc2073--bc2073.hifi_reads.fastq.gz -n DNA01 -o /home/aiakupo/SpecHLA/output
[M::mm_idx_gen::0.461*1.00] collected minimizers
[M::mm_idx_gen::0.516*1.41] sorted minimizers
[M::main::0.517*1.41] loaded/built the index for 1591 target sequence(s)
[M::mm_mapopt_update::0.519*1.41] mid_occ = 936
[M::mm_idx_stat] kmer size: 19; skip: 10; is_hpc: 1; #seq: 1591
[M::mm_idx_stat::0.520*1.41] distinct minimizers: 49986 (20.90% are singletons); average occurrences: 39.684; average spacing: 7.725; total length: 15322978
[M::worker_pipeline::2785.920*4.99] mapped 87302 sequences
[M::worker_pipeline::5446.614*5.00] mapped 90496 sequences
[M::worker_pipeline::6477.379*4.98] mapped 34247 sequences
[M::main] Version: 2.22-r1101
[M::main] CMD: minimap2 -t 5 -x map-pb -p 0.1 -N 100000 -a /home/aiakupo/SpecHLA/script/../db//ref/hla_gen.format.filter.extend.DRB.no26789.fasta /fast/AG_Lee/Alisa/HLA/0000004436/outputs/fastx_files/m64316_240319_133624.bc2073--bc2073.hifi_reads.fastq.gz
[M::main] Real time: 6477.463 sec; CPU: 32256.322 sec; Peak RSS: 4.391 GB
alignment done.
reads-binning done.
[M::mm_idx_gen::0.002*2.07] collected minimizers
[M::mm_idx_gen::0.003*3.10] sorted minimizers
[M::main::0.003*3.08] loaded/built the index for 1 target sequence(s)
[M::mm_mapopt_update::0.003*2.96] mid_occ = 10
[M::mm_idx_stat] kmer size: 19; skip: 10; is_hpc: 1; #seq: 1
[M::mm_idx_stat::0.003*2.91] distinct minimizers: 706 (100.00% are singletons); average occurrences: 1.000; average spacing: 7.795; total length: 5503
[M::main] Version: 2.22-r1101
[M::main] CMD: minimap2 -t 5 -x map-pb -a /home/aiakupo/SpecHLA/script/../db//HLA/HLA_A/HLA_A.fa /home/aiakupo/SpecHLA/output/DNA01//A.long_read.fq.gz
[M::main] Real time: 0.004 sec; CPU: 0.010 sec; Peak RSS: 0.003 GB
downsample ratio is 1 for A
Mean depth {'HLA_A': 0}
2024-07-08 16:05:00 Min read coverage set to 2.
2024-07-08 16:05:00 Max read coverage set to 100000.
2024-07-08 16:05:00 Estimating alignment parameters...
2024-07-08 16:05:00 Done estimating alignment parameters.
Transition Probabilities:
match -> match: 0.333
match -> insertion: 0.333
match -> deletion: 0.333
deletion -> match: 0.500
deletion -> deletion: 0.500
insertion -> match: 0.500
insertion -> insertion: 0.500
Emission Probabilities:
match (equal): 0.500
match (not equal): 0.167
insertion: 1.000
deletion: 1.000
2024-07-08 16:05:00 Calling potential SNVs using pileup...
2024-07-08 16:05:00 0 potential variants identified.
No candidate variants identified, printing empty VCF file...
[M::mm_idx_gen::0.002*1.89] collected minimizers
[M::mm_idx_gen::0.003*3.13] sorted minimizers
[M::main::0.003*3.11] loaded/built the index for 1 target sequence(s)
[M::mm_mapopt_update::0.003*3.00] mid_occ = 10
[M::mm_idx_stat] kmer size: 19; skip: 10; is_hpc: 1; #seq: 1
[M::mm_idx_stat::0.003*2.95] distinct minimizers: 788 (100.00% are singletons); average occurrences: 1.000; average spacing: 7.717; total length: 6081
[M::main] Version: 2.22-r1101
[M::main] CMD: minimap2 -t 5 -x map-pb -a /home/aiakupo/SpecHLA/script/../db//HLA/HLA_B/HLA_B.fa /home/aiakupo/SpecHLA/output/DNA01//B.long_read.fq.gz
[M::main] Real time: 0.004 sec; CPU: 0.010 sec; Peak RSS: 0.003 GB
downsample ratio is 1 for B
Mean depth {'HLA_B': 0}
2024-07-08 16:05:00 Min read coverage set to 2.
2024-07-08 16:05:00 Max read coverage set to 100000.
2024-07-08 16:05:00 Estimating alignment parameters...
2024-07-08 16:05:00 Done estimating alignment parameters.
Transition Probabilities:
match -> match: 0.333
match -> insertion: 0.333
match -> deletion: 0.333
deletion -> match: 0.500
deletion -> deletion: 0.500
insertion -> match: 0.500
insertion -> insertion: 0.500
Emission Probabilities:
match (equal): 0.500
match (not equal): 0.167
insertion: 1.000
deletion: 1.000
2024-07-08 16:05:00 Calling potential SNVs using pileup...
2024-07-08 16:05:00 0 potential variants identified.
No candidate variants identified, printing empty VCF file...
[M::mm_idx_gen::0.002*1.77] collected minimizers
[M::mm_idx_gen::0.003*2.86] sorted minimizers
[M::main::0.003*2.84] loaded/built the index for 1 target sequence(s)
[M::mm_mapopt_update::0.003*2.71] mid_occ = 10
[M::mm_idx_stat] kmer size: 19; skip: 10; is_hpc: 1; #seq: 1
[M::mm_idx_stat::0.003*2.67] distinct minimizers: 826 (100.00% are singletons); average occurrences: 1.000; average spacing: 7.632; total length: 6304
[M::main] Version: 2.22-r1101
[M::main] CMD: minimap2 -t 5 -x map-pb -a /home/aiakupo/SpecHLA/script/../db//HLA/HLA_C/HLA_C.fa /home/aiakupo/SpecHLA/output/DNA01//C.long_read.fq.gz
[M::main] Real time: 0.004 sec; CPU: 0.009 sec; Peak RSS: 0.003 GB
downsample ratio is 1 for C
Mean depth {'HLA_C': 0}
2024-07-08 16:05:00 Min read coverage set to 2.
2024-07-08 16:05:00 Max read coverage set to 100000.
2024-07-08 16:05:00 Estimating alignment parameters...
2024-07-08 16:05:00 Done estimating alignment parameters.
Transition Probabilities:
match -> match: 0.333
match -> insertion: 0.333
match -> deletion: 0.333
deletion -> match: 0.500
deletion -> deletion: 0.500
insertion -> match: 0.500
insertion -> insertion: 0.500
Emission Probabilities:
match (equal): 0.500
match (not equal): 0.167
insertion: 1.000
deletion: 1.000
2024-07-08 16:05:00 Calling potential SNVs using pileup...
2024-07-08 16:05:00 0 potential variants identified.
No candidate variants identified, printing empty VCF file...
[M::mm_idx_gen::0.002*1.71] collected minimizers
[M::mm_idx_gen::0.003*2.73] sorted minimizers
[M::main::0.003*2.71] loaded/built the index for 1 target sequence(s)
[M::mm_mapopt_update::0.003*2.61] mid_occ = 10
[M::mm_idx_stat] kmer size: 19; skip: 10; is_hpc: 1; #seq: 1
[M::mm_idx_stat::0.004*2.55] distinct minimizers: 1488 (100.00% are singletons); average occurrences: 1.000; average spacing: 7.913; total length: 11775
[M::worker_pipeline::0.270*2.62] mapped 231 sequences
[M::main] Version: 2.22-r1101
[M::main] CMD: minimap2 -t 5 -x map-pb -a /home/aiakupo/SpecHLA/script/../db//HLA/HLA_DPA1/HLA_DPA1.fa /home/aiakupo/SpecHLA/output/DNA01//DPA1.long_read.fq.gz
[M::main] Real time: 0.271 sec; CPU: 0.707 sec; Peak RSS: 0.071 GB
downsample ratio is 1 for DPA1
Mean depth {'HLA_DPA1': 0}
2024-07-08 16:05:01 Min read coverage set to 2.
2024-07-08 16:05:01 Max read coverage set to 100000.
2024-07-08 16:05:01 Estimating alignment parameters...
2024-07-08 16:05:01 Done estimating alignment parameters.
Transition Probabilities:
match -> match: 0.333
match -> insertion: 0.333
match -> deletion: 0.333
deletion -> match: 0.500
deletion -> deletion: 0.500
insertion -> match: 0.500
insertion -> insertion: 0.500
Emission Probabilities:
match (equal): 0.500
match (not equal): 0.167
insertion: 1.000
deletion: 1.000
2024-07-08 16:05:01 Calling potential SNVs using pileup...
2024-07-08 16:05:01 0 potential variants identified.
No candidate variants identified, printing empty VCF file...
[M::mm_idx_gen::0.002*1.64] collected minimizers
[M::mm_idx_gen::0.003*2.93] sorted minimizers
[M::main::0.003*2.91] loaded/built the index for 1 target sequence(s)
[M::mm_mapopt_update::0.004*2.79] mid_occ = 10
[M::mm_idx_stat] kmer size: 19; skip: 10; is_hpc: 1; #seq: 1
[M::mm_idx_stat::0.004*2.73] distinct minimizers: 1730 (99.94% are singletons); average occurrences: 1.001; average spacing: 7.780; total length: 13468
[M::worker_pipeline::7.230*2.84] mapped 5835 sequences
[M::main] Version: 2.22-r1101
[M::main] CMD: minimap2 -t 5 -x map-pb -a /home/aiakupo/SpecHLA/script/../db//HLA/HLA_DPB1/HLA_DPB1.fa /home/aiakupo/SpecHLA/output/DNA01//DPB1.long_read.fq.gz
[M::main] Real time: 7.231 sec; CPU: 20.564 sec; Peak RSS: 0.147 GB
downsample ratio is 1 for DPB1
Mean depth {'HLA_DPB1': 182}
2024-07-08 16:05:12 Min read coverage set to 2.
2024-07-08 16:05:12 Max read coverage set to 100000.
2024-07-08 16:05:12 Estimating alignment parameters...
2024-07-08 16:05:12 Done estimating alignment parameters.
Transition Probabilities:
match -> match: 0.990
match -> insertion: 0.006
match -> deletion: 0.004
deletion -> match: 0.672
deletion -> deletion: 0.328
insertion -> match: 0.763
insertion -> insertion: 0.237
Emission Probabilities:
match (equal): 0.925
match (not equal): 0.025
insertion: 1.000
deletion: 1.000
2024-07-08 16:05:12 Calling potential SNVs using pileup...
2024-07-08 16:05:12 123 potential variants identified.
2024-07-08 16:05:12 Generating haplotype fragments from reads...
2024-07-08 16:05:17 10% of variants processed...
2024-07-08 16:05:17 20% of variants processed...
2024-07-08 16:05:18 30% of variants processed...
2024-07-08 16:05:18 40% of variants processed...
2024-07-08 16:05:18 100% of variants processed.
2024-07-08 16:05:18 Calling initial genotypes using pair-HMM realignment...
2024-07-08 16:05:18 Iteratively assembling haplotypes and refining genotypes...
2024-07-08 16:05:18 Round 1 of haplotype assembly...
2024-07-08 16:05:18 (Before HapCUT2) Total phased heterozygous SNVs: 61 Total likelihood (phred): 1012374.57
2024-07-08 16:05:20 (After HapCUT2) Total phased heterozygous SNVs: 61 Total likelihood (phred): 322425.24
2024-07-08 16:05:20 (After Greedy) Total phased heterozygous SNVs: 61 Total likelihood (phred): 291216.51
2024-07-08 16:05:20 Round 2 of haplotype assembly...
2024-07-08 16:05:20 (Before HapCUT2) Total phased heterozygous SNVs: 46 Total likelihood (phred): 291216.51
2024-07-08 16:05:21 (After HapCUT2) Total phased heterozygous SNVs: 46 Total likelihood (phred): 291900.35
2024-07-08 16:05:21 (After Greedy) Total phased heterozygous SNVs: 46 Total likelihood (phred): 291216.51
2024-07-08 16:05:21 Printing VCF file...
[M::mm_idx_gen::0.002*1.75] collected minimizers
[M::mm_idx_gen::0.003*3.13] sorted minimizers
[M::main::0.003*3.12] loaded/built the index for 1 target sequence(s)
[M::mm_mapopt_update::0.004*2.99] mid_occ = 10
[M::mm_idx_stat] kmer size: 19; skip: 10; is_hpc: 1; #seq: 1
[M::mm_idx_stat::0.004*2.94] distinct minimizers: 1059 (99.91% are singletons); average occurrences: 1.008; average spacing: 7.959; total length: 8492
[M::worker_pipeline::6.387*3.58] mapped 3343 sequences
[M::main] Version: 2.22-r1101
[M::main] CMD: minimap2 -t 5 -x map-pb -a /home/aiakupo/SpecHLA/script/../db//HLA/HLA_DQA1/HLA_DQA1.fa /home/aiakupo/SpecHLA/output/DNA01//DQA1.long_read.fq.gz
[M::main] Real time: 6.388 sec; CPU: 22.850 sec; Peak RSS: 0.153 GB
downsample ratio is 1 for DQA1
Mean depth {'HLA_DQA1': 88}
2024-07-08 16:05:30 Min read coverage set to 2.
2024-07-08 16:05:30 Max read coverage set to 100000.
2024-07-08 16:05:30 Estimating alignment parameters...
2024-07-08 16:05:30 Done estimating alignment parameters.
Transition Probabilities:
match -> match: 0.991
match -> insertion: 0.005
match -> deletion: 0.004
deletion -> match: 0.514
deletion -> deletion: 0.486
insertion -> match: 0.556
insertion -> insertion: 0.444
Emission Probabilities:
match (equal): 0.975
match (not equal): 0.008
insertion: 1.000
deletion: 1.000
2024-07-08 16:05:30 Calling potential SNVs using pileup...
2024-07-08 16:05:30 183 potential variants identified.
2024-07-08 16:05:30 Generating haplotype fragments from reads...
2024-07-08 16:05:30 10% of variants processed...
2024-07-08 16:05:30 20% of variants processed...
2024-07-08 16:05:30 30% of variants processed...
2024-07-08 16:05:30 40% of variants processed...
2024-07-08 16:05:30 50% of variants processed...
2024-07-08 16:05:30 60% of variants processed...
2024-07-08 16:05:30 70% of variants processed...
2024-07-08 16:05:31 80% of variants processed...
2024-07-08 16:05:31 90% of variants processed...
2024-07-08 16:05:31 100% of variants processed.
2024-07-08 16:05:31 Calling initial genotypes using pair-HMM realignment...
2024-07-08 16:05:31 Iteratively assembling haplotypes and refining genotypes...
2024-07-08 16:05:31 Round 1 of haplotype assembly...
2024-07-08 16:05:31 (Before HapCUT2) Total phased heterozygous SNVs: 123 Total likelihood (phred): 92420.53
2024-07-08 16:05:31 (After HapCUT2) Total phased heterozygous SNVs: 123 Total likelihood (phred): 38571.91
2024-07-08 16:05:31 (After Greedy) Total phased heterozygous SNVs: 123 Total likelihood (phred): 34372.78
2024-07-08 16:05:31 Round 2 of haplotype assembly...
2024-07-08 16:05:31 (Before HapCUT2) Total phased heterozygous SNVs: 99 Total likelihood (phred): 34372.78
2024-07-08 16:05:32 (After HapCUT2) Total phased heterozygous SNVs: 99 Total likelihood (phred): 34372.78
2024-07-08 16:05:32 (After Greedy) Total phased heterozygous SNVs: 99 Total likelihood (phred): 34372.78
2024-07-08 16:05:32 Printing VCF file...
[M::mm_idx_gen::0.002*1.79] collected minimizers
[M::mm_idx_gen::0.003*3.00] sorted minimizers
[M::main::0.003*2.99] loaded/built the index for 1 target sequence(s)
[M::mm_mapopt_update::0.003*2.84] mid_occ = 10
[M::mm_idx_stat] kmer size: 19; skip: 10; is_hpc: 1; #seq: 1
[M::mm_idx_stat::0.003*2.78] distinct minimizers: 1216 (99.84% are singletons); average occurrences: 1.004; average spacing: 7.764; total length: 9480
[M::worker_pipeline::0.888*2.90] mapped 668 sequences
[M::main] Version: 2.22-r1101
[M::main] CMD: minimap2 -t 5 -x map-pb -a /home/aiakupo/SpecHLA/script/../db//HLA/HLA_DQB1/HLA_DQB1.fa /home/aiakupo/SpecHLA/output/DNA01//DQB1.long_read.fq.gz
[M::main] Real time: 0.889 sec; CPU: 2.572 sec; Peak RSS: 0.096 GB
downsample ratio is 1 for DQB1
Mean depth {'HLA_DQB1': 95}
2024-07-08 16:05:33 Min read coverage set to 2.
2024-07-08 16:05:33 Max read coverage set to 100000.
2024-07-08 16:05:33 Estimating alignment parameters...
2024-07-08 16:05:33 Done estimating alignment parameters.
Transition Probabilities:
match -> match: 0.993
match -> insertion: 0.003
match -> deletion: 0.004
deletion -> match: 0.406
deletion -> deletion: 0.594
insertion -> match: 0.297
insertion -> insertion: 0.703
Emission Probabilities:
match (equal): 0.950
match (not equal): 0.017
insertion: 1.000
deletion: 1.000
2024-07-08 16:05:33 Calling potential SNVs using pileup...
2024-07-08 16:05:33 547 potential variants identified.
2024-07-08 16:05:33 Generating haplotype fragments from reads...
2024-07-08 16:05:34 10% of variants processed...
2024-07-08 16:05:34 20% of variants processed...
2024-07-08 16:05:34 30% of variants processed...
2024-07-08 16:05:34 40% of variants processed...
2024-07-08 16:05:34 50% of variants processed...
2024-07-08 16:05:34 60% of variants processed...
2024-07-08 16:05:34 70% of variants processed...
2024-07-08 16:05:34 80% of variants processed...
2024-07-08 16:05:34 90% of variants processed...
2024-07-08 16:05:34 100% of variants processed.
2024-07-08 16:05:34 Calling initial genotypes using pair-HMM realignment...
2024-07-08 16:05:34 Iteratively assembling haplotypes and refining genotypes...
2024-07-08 16:05:34 Round 1 of haplotype assembly...
2024-07-08 16:05:34 (Before HapCUT2) Total phased heterozygous SNVs: 23 Total likelihood (phred): 757167.63
2024-07-08 16:05:34 (After HapCUT2) Total phased heterozygous SNVs: 23 Total likelihood (phred): 36533.11
2024-07-08 16:05:34 (After Greedy) Total phased heterozygous SNVs: 23 Total likelihood (phred): 35002.70
2024-07-08 16:05:34 Round 2 of haplotype assembly...
2024-07-08 16:05:34 (Before HapCUT2) Total phased heterozygous SNVs: 5 Total likelihood (phred): 35002.70
2024-07-08 16:05:34 (After HapCUT2) Total phased heterozygous SNVs: 5 Total likelihood (phred): 35002.70
2024-07-08 16:05:34 (After Greedy) Total phased heterozygous SNVs: 5 Total likelihood (phred): 35002.70
2024-07-08 16:05:34 Printing VCF file...
[M::mm_idx_gen::0.002*1.70] collected minimizers
[M::mm_idx_gen::0.003*2.97] sorted minimizers
[M::main::0.003*2.95] loaded/built the index for 1 target sequence(s)
[M::mm_mapopt_update::0.004*2.81] mid_occ = 10
[M::mm_idx_stat] kmer size: 19; skip: 10; is_hpc: 1; #seq: 1
[M::mm_idx_stat::0.004*2.75] distinct minimizers: 1692 (99.82% are singletons); average occurrences: 1.004; average spacing: 7.786; total length: 13229
[M::worker_pipeline::5.751*0.83] mapped 9477 sequences
[M::main] Version: 2.22-r1101
[M::main] CMD: minimap2 -t 5 -x map-pb -a /home/aiakupo/SpecHLA/script/../db//HLA/HLA_DRB1/HLA_DRB1.fa /home/aiakupo/SpecHLA/output/DNA01//DRB1.long_read.fq.gz
[M::main] Real time: 5.752 sec; CPU: 4.776 sec; Peak RSS: 0.245 GB
downsample ratio is 1 for DRB1
Mean depth {'HLA_DRB1': 45}
2024-07-08 16:05:45 Min read coverage set to 2.
2024-07-08 16:05:45 Max read coverage set to 100000.
2024-07-08 16:05:45 Estimating alignment parameters...
2024-07-08 16:05:45 Done estimating alignment parameters.
Transition Probabilities:
match -> match: 0.995
match -> insertion: 0.003
match -> deletion: 0.002
deletion -> match: 0.625
deletion -> deletion: 0.375
insertion -> match: 0.577
insertion -> insertion: 0.423
Emission Probabilities:
match (equal): 0.972
match (not equal): 0.009
insertion: 1.000
deletion: 1.000
2024-07-08 16:05:45 Calling potential SNVs using pileup...
2024-07-08 16:05:45 592 potential variants identified.
2024-07-08 16:05:45 Generating haplotype fragments from reads...
2024-07-08 16:05:46 10% of variants processed...
2024-07-08 16:05:46 20% of variants processed...
2024-07-08 16:05:46 30% of variants processed...
2024-07-08 16:05:46 40% of variants processed...
2024-07-08 16:05:46 50% of variants processed...
2024-07-08 16:05:46 60% of variants processed...
2024-07-08 16:05:46 70% of variants processed...
2024-07-08 16:05:46 80% of variants processed...
2024-07-08 16:05:46 90% of variants processed...
2024-07-08 16:05:46 100% of variants processed.
2024-07-08 16:05:46 Calling initial genotypes using pair-HMM realignment...
2024-07-08 16:05:46 Iteratively assembling haplotypes and refining genotypes...
2024-07-08 16:05:46 Round 1 of haplotype assembly...
2024-07-08 16:05:46 (Before HapCUT2) Total phased heterozygous SNVs: 216 Total likelihood (phred): 224750.42
2024-07-08 16:05:48 (After HapCUT2) Total phased heterozygous SNVs: 216 Total likelihood (phred): 61012.72
2024-07-08 16:05:48 (After Greedy) Total phased heterozygous SNVs: 216 Total likelihood (phred): 44938.21
2024-07-08 16:05:48 Round 2 of haplotype assembly...
2024-07-08 16:05:48 (Before HapCUT2) Total phased heterozygous SNVs: 327 Total likelihood (phred): 44938.21
2024-07-08 16:05:51 (After HapCUT2) Total phased heterozygous SNVs: 327 Total likelihood (phred): 47655.22
2024-07-08 16:05:51 (After Greedy) Total phased heterozygous SNVs: 327 Total likelihood (phred): 44208.28
2024-07-08 16:05:51 Printing VCF file...
Sequence is reconstructed, start annotation...
parameter: sample:DNA01 dir:/home/aiakupo/SpecHLA/output/DNA01/ pop:Unknown wxs:tgs G_nom:0
No such file or directory
# version: IPD-IMGT/HLA 3.38.0
Sample HLA_A_1 HLA_A_2 HLA_B_1 HLA_B_2 HLA_C_1 HLA_C_2 HLA_DPA1_1 HLA_DPA1_2 HLA_DPB1_1 HLA_DPB1_2 HLA_DQA1_1 HLA_DQA1_2 HLA_DQB1_1 HLA_DQB1_2 HLA_DRB1_1 HLA_DRB1_2
optimize typing results by balancing alignment length and identity
The refined typing results is in /home/aiakupo/SpecHLA/output/DNA01//hlala.like.results.txt
Finished.
Hi Alisa,
The log shows the HLA II genes have high depth. The database likely causes the issue. Does the folder '~/SpecHLA/db/' exist? If not, please download it. If it exists, updating to the latest version of SpecHLA might solve the problem.
Please let me know if these work for you.
On Tue, Jul 9, 2024 at 4:08 AM Alisa @.***> wrote:
Hi Shuai,
both vcf and hla.allele..HLA_.fasta contain information, so these steps seemed to work.. I understand why it doesn't work for HLA I: we did enrichment for a certain region of MHC II region only. Well, I thought alleles could be determined there at least but still the final output file is empty somehow :(
(base) @.:~/SpecHLA/output/DNA01$ cat hlala.like.results.txt Locus Chromosome Allele A 1 A 2 B 1 B 2 C 1 C 2 DPA1 1 DPA1 2 DPB1 1 DPB1 2 DQA1 1 DQA1 2 DQB1 1 DQB1 2 DRB1 1 DRB1 2 (base) @.:~/SpecHLA/output/DNA01$ cat hla.result. hla.result.details.txt hla.result.txt (base) @.***:~/SpecHLA/output/DNA01$ cat hla.result.txt
version: IPD-IMGT/HLA 3.38.0
Sample HLA_A_1 HLA_A_2 HLA_B_1 HLA_B_2 HLA_C_1 HLA_C_2 HLA_DPA1_1 HLA_DPA1_2 HLA_DPB1_1 HLA_DPB1_2 HLA_DQA1_1 HLA_DQA1_2 HLA_DQB1_1 HLA_DQB1_2 HLA_DRB1_1 HLA_DRB1_2 (base) @.***:~/SpecHLA/output/DNA01$ cat hla.result.details.txt
version: IPD-IMGT/HLA 3.38.0
Gene G_best allele details:allele;Score;Caucasian;Black;Asian
@.**:~/SpecHLA/script$ python3 long_read_typing.py -r /fast/AG_Lee/Alisa/HLA/0000004436/outputs/fastx_files/m64316_240319_133624.bc2073--bc2073.hifi_reads.fastq.gz -n DNA01 -o /home/aiakupo/SpecHLA/output [M::mm_idx_gen::0.4611.00] collected minimizers [M::mm_idx_gen::0.5161.41] sorted minimizers [M::main::0.5171.41] loaded/built the index for 1591 target sequence(s) [M::mm_mapopt_update::0.5191.41] mid_occ = 936 [M::mm_idx_stat] kmer size: 19; skip: 10; is_hpc: 1; #seq: 1591 [M::mm_idx_stat::0.5201.41] distinct minimizers: 49986 (20.90% are singletons); average occurrences: 39.684; average spacing: 7.725; total length: 15322978 [M::worker_pipeline::2785.9204.99] mapped 87302 sequences [M::worker_pipeline::5446.6145.00] mapped 90496 sequences [M::worker_pipeline::6477.3794.98] mapped 34247 sequences [M::main] Version: 2.22-r1101 [M::main] CMD: minimap2 -t 5 -x map-pb -p 0.1 -N 100000 -a /home/aiakupo/SpecHLA/script/../db//ref/hla_gen.format.filter.extend.DRB.no26789.fasta /fast/AG_Lee/Alisa/HLA/0000004436/outputs/fastx_files/m64316_240319_133624.bc2073--bc2073.hifi_reads.fastq.gz [M::main] Real time: 6477.463 sec; CPU: 32256.322 sec; Peak RSS: 4.391 GB alignment done. reads-binning done. [M::mm_idx_gen::0.0022.07] collected minimizers [M::mm_idx_gen::0.0033.10] sorted minimizers [M::main::0.0033.08] loaded/built the index for 1 target sequence(s) [M::mm_mapopt_update::0.0032.96] mid_occ = 10 [M::mm_idx_stat] kmer size: 19; skip: 10; is_hpc: 1; #seq: 1 [M::mm_idx_stat::0.0032.91] distinct minimizers: 706 (100.00% are singletons); average occurrences: 1.000; average spacing: 7.795; total length: 5503 [M::main] Version: 2.22-r1101 [M::main] CMD: minimap2 -t 5 -x map-pb -a /home/aiakupo/SpecHLA/script/../db//HLA/HLA_A/HLA_A.fa /home/aiakupo/SpecHLA/output/DNA01//A.long_read.fq.gz [M::main] Real time: 0.004 sec; CPU: 0.010 sec; Peak RSS: 0.003 GB downsample ratio is 1 for A Mean depth {'HLA_A': 0}
2024-07-08 16:05:00 Min read coverage set to 2. 2024-07-08 16:05:00 Max read coverage set to 100000. 2024-07-08 16:05:00 Estimating alignment parameters... 2024-07-08 16:05:00 Done estimating alignment parameters.
Transition Probabilities: match -> match: 0.333 match -> insertion: 0.333 match -> deletion: 0.333 deletion -> match: 0.500 deletion -> deletion: 0.500 insertion -> match: 0.500 insertion -> insertion: 0.500 Emission Probabilities: match (equal): 0.500 match (not equal): 0.167 insertion: 1.000 deletion: 1.000
2024-07-08 16:05:00 Calling potential SNVs using pileup... 2024-07-08 16:05:00 0 potential variants identified. No candidate variants identified, printing empty VCF file... [M::mm_idx_gen::0.0021.89] collected minimizers [M::mm_idx_gen::0.0033.13] sorted minimizers [M::main::0.0033.11] loaded/built the index for 1 target sequence(s) [M::mm_mapopt_update::0.0033.00] mid_occ = 10 [M::mm_idx_stat] kmer size: 19; skip: 10; is_hpc: 1; #seq: 1 [M::mm_idx_stat::0.003*2.95] distinct minimizers: 788 (100.00% are singletons); average occurrences: 1.000; average spacing: 7.717; total length: 6081 [M::main] Version: 2.22-r1101 [M::main] CMD: minimap2 -t 5 -x map-pb -a /home/aiakupo/SpecHLA/script/../db//HLA/HLA_B/HLA_B.fa /home/aiakupo/SpecHLA/output/DNA01//B.long_read.fq.gz [M::main] Real time: 0.004 sec; CPU: 0.010 sec; Peak RSS: 0.003 GB downsample ratio is 1 for B Mean depth {'HLA_B': 0}
2024-07-08 16:05:00 Min read coverage set to 2. 2024-07-08 16:05:00 Max read coverage set to 100000. 2024-07-08 16:05:00 Estimating alignment parameters... 2024-07-08 16:05:00 Done estimating alignment parameters.
Transition Probabilities: match -> match: 0.333 match -> insertion: 0.333 match -> deletion: 0.333 deletion -> match: 0.500 deletion -> deletion: 0.500 insertion -> match: 0.500 insertion -> insertion: 0.500 Emission Probabilities: match (equal): 0.500 match (not equal): 0.167 insertion: 1.000 deletion: 1.000
2024-07-08 16:05:00 Calling potential SNVs using pileup... 2024-07-08 16:05:00 0 potential variants identified. No candidate variants identified, printing empty VCF file... [M::mm_idx_gen::0.0021.77] collected minimizers [M::mm_idx_gen::0.0032.86] sorted minimizers [M::main::0.0032.84] loaded/built the index for 1 target sequence(s) [M::mm_mapopt_update::0.0032.71] mid_occ = 10 [M::mm_idx_stat] kmer size: 19; skip: 10; is_hpc: 1; #seq: 1 [M::mm_idx_stat::0.003*2.67] distinct minimizers: 826 (100.00% are singletons); average occurrences: 1.000; average spacing: 7.632; total length: 6304 [M::main] Version: 2.22-r1101 [M::main] CMD: minimap2 -t 5 -x map-pb -a /home/aiakupo/SpecHLA/script/../db//HLA/HLA_C/HLA_C.fa /home/aiakupo/SpecHLA/output/DNA01//C.long_read.fq.gz [M::main] Real time: 0.004 sec; CPU: 0.009 sec; Peak RSS: 0.003 GB downsample ratio is 1 for C Mean depth {'HLA_C': 0}
2024-07-08 16:05:00 Min read coverage set to 2. 2024-07-08 16:05:00 Max read coverage set to 100000. 2024-07-08 16:05:00 Estimating alignment parameters... 2024-07-08 16:05:00 Done estimating alignment parameters.
Transition Probabilities: match -> match: 0.333 match -> insertion: 0.333 match -> deletion: 0.333 deletion -> match: 0.500 deletion -> deletion: 0.500 insertion -> match: 0.500 insertion -> insertion: 0.500 Emission Probabilities: match (equal): 0.500 match (not equal): 0.167 insertion: 1.000 deletion: 1.000
2024-07-08 16:05:00 Calling potential SNVs using pileup... 2024-07-08 16:05:00 0 potential variants identified. No candidate variants identified, printing empty VCF file... [M::mm_idx_gen::0.0021.71] collected minimizers [M::mm_idx_gen::0.0032.73] sorted minimizers [M::main::0.0032.71] loaded/built the index for 1 target sequence(s) [M::mm_mapopt_update::0.0032.61] mid_occ = 10 [M::mm_idx_stat] kmer size: 19; skip: 10; is_hpc: 1; #seq: 1 [M::mm_idx_stat::0.0042.55] distinct minimizers: 1488 (100.00% are singletons); average occurrences: 1.000; average spacing: 7.913; total length: 11775 [M::worker_pipeline::0.2702.62] mapped 231 sequences [M::main] Version: 2.22-r1101 [M::main] CMD: minimap2 -t 5 -x map-pb -a /home/aiakupo/SpecHLA/script/../db//HLA/HLA_DPA1/HLA_DPA1.fa /home/aiakupo/SpecHLA/output/DNA01//DPA1.long_read.fq.gz [M::main] Real time: 0.271 sec; CPU: 0.707 sec; Peak RSS: 0.071 GB downsample ratio is 1 for DPA1 Mean depth {'HLA_DPA1': 0}
2024-07-08 16:05:01 Min read coverage set to 2. 2024-07-08 16:05:01 Max read coverage set to 100000. 2024-07-08 16:05:01 Estimating alignment parameters... 2024-07-08 16:05:01 Done estimating alignment parameters.
Transition Probabilities: match -> match: 0.333 match -> insertion: 0.333 match -> deletion: 0.333 deletion -> match: 0.500 deletion -> deletion: 0.500 insertion -> match: 0.500 insertion -> insertion: 0.500 Emission Probabilities: match (equal): 0.500 match (not equal): 0.167 insertion: 1.000 deletion: 1.000
2024-07-08 16:05:01 Calling potential SNVs using pileup... 2024-07-08 16:05:01 0 potential variants identified. No candidate variants identified, printing empty VCF file... [M::mm_idx_gen::0.0021.64] collected minimizers [M::mm_idx_gen::0.0032.93] sorted minimizers [M::main::0.0032.91] loaded/built the index for 1 target sequence(s) [M::mm_mapopt_update::0.0042.79] mid_occ = 10 [M::mm_idx_stat] kmer size: 19; skip: 10; is_hpc: 1; #seq: 1 [M::mm_idx_stat::0.0042.73] distinct minimizers: 1730 (99.94% are singletons); average occurrences: 1.001; average spacing: 7.780; total length: 13468 [M::worker_pipeline::7.2302.84] mapped 5835 sequences [M::main] Version: 2.22-r1101 [M::main] CMD: minimap2 -t 5 -x map-pb -a /home/aiakupo/SpecHLA/script/../db//HLA/HLA_DPB1/HLA_DPB1.fa /home/aiakupo/SpecHLA/output/DNA01//DPB1.long_read.fq.gz [M::main] Real time: 7.231 sec; CPU: 20.564 sec; Peak RSS: 0.147 GB downsample ratio is 1 for DPB1 Mean depth {'HLA_DPB1': 182}
2024-07-08 16:05:12 Min read coverage set to 2. 2024-07-08 16:05:12 Max read coverage set to 100000. 2024-07-08 16:05:12 Estimating alignment parameters... 2024-07-08 16:05:12 Done estimating alignment parameters.
Transition Probabilities: match -> match: 0.990 match -> insertion: 0.006 match -> deletion: 0.004 deletion -> match: 0.672 deletion -> deletion: 0.328 insertion -> match: 0.763 insertion -> insertion: 0.237 Emission Probabilities: match (equal): 0.925 match (not equal): 0.025 insertion: 1.000 deletion: 1.000
2024-07-08 16:05:12 Calling potential SNVs using pileup... 2024-07-08 16:05:12 123 potential variants identified. 2024-07-08 16:05:12 Generating haplotype fragments from reads... 2024-07-08 16:05:17 10% of variants processed... 2024-07-08 16:05:17 20% of variants processed... 2024-07-08 16:05:18 30% of variants processed... 2024-07-08 16:05:18 40% of variants processed... 2024-07-08 16:05:18 100% of variants processed. 2024-07-08 16:05:18 Calling initial genotypes using pair-HMM realignment... 2024-07-08 16:05:18 Iteratively assembling haplotypes and refining genotypes... 2024-07-08 16:05:18 Round 1 of haplotype assembly... 2024-07-08 16:05:18 (Before HapCUT2) Total phased heterozygous SNVs: 61 Total likelihood (phred): 1012374.57 2024-07-08 16:05:20 (After HapCUT2) Total phased heterozygous SNVs: 61 Total likelihood (phred): 322425.24 2024-07-08 16:05:20 (After Greedy) Total phased heterozygous SNVs: 61 Total likelihood (phred): 291216.51 2024-07-08 16:05:20 Round 2 of haplotype assembly... 2024-07-08 16:05:20 (Before HapCUT2) Total phased heterozygous SNVs: 46 Total likelihood (phred): 291216.51 2024-07-08 16:05:21 (After HapCUT2) Total phased heterozygous SNVs: 46 Total likelihood (phred): 291900.35 2024-07-08 16:05:21 (After Greedy) Total phased heterozygous SNVs: 46 Total likelihood (phred): 291216.51 2024-07-08 16:05:21 Printing VCF file... [M::mm_idx_gen::0.0021.75] collected minimizers [M::mm_idx_gen::0.0033.13] sorted minimizers [M::main::0.0033.12] loaded/built the index for 1 target sequence(s) [M::mm_mapopt_update::0.0042.99] mid_occ = 10 [M::mm_idx_stat] kmer size: 19; skip: 10; is_hpc: 1; #seq: 1 [M::mm_idx_stat::0.0042.94] distinct minimizers: 1059 (99.91% are singletons); average occurrences: 1.008; average spacing: 7.959; total length: 8492 [M::worker_pipeline::6.3873.58] mapped 3343 sequences [M::main] Version: 2.22-r1101 [M::main] CMD: minimap2 -t 5 -x map-pb -a /home/aiakupo/SpecHLA/script/../db//HLA/HLA_DQA1/HLA_DQA1.fa /home/aiakupo/SpecHLA/output/DNA01//DQA1.long_read.fq.gz [M::main] Real time: 6.388 sec; CPU: 22.850 sec; Peak RSS: 0.153 GB downsample ratio is 1 for DQA1 Mean depth {'HLA_DQA1': 88}
2024-07-08 16:05:30 Min read coverage set to 2. 2024-07-08 16:05:30 Max read coverage set to 100000. 2024-07-08 16:05:30 Estimating alignment parameters... 2024-07-08 16:05:30 Done estimating alignment parameters.
Transition Probabilities: match -> match: 0.991 match -> insertion: 0.005 match -> deletion: 0.004 deletion -> match: 0.514 deletion -> deletion: 0.486 insertion -> match: 0.556 insertion -> insertion: 0.444 Emission Probabilities: match (equal): 0.975 match (not equal): 0.008 insertion: 1.000 deletion: 1.000
2024-07-08 16:05:30 Calling potential SNVs using pileup... 2024-07-08 16:05:30 183 potential variants identified. 2024-07-08 16:05:30 Generating haplotype fragments from reads... 2024-07-08 16:05:30 10% of variants processed... 2024-07-08 16:05:30 20% of variants processed... 2024-07-08 16:05:30 30% of variants processed... 2024-07-08 16:05:30 40% of variants processed... 2024-07-08 16:05:30 50% of variants processed... 2024-07-08 16:05:30 60% of variants processed... 2024-07-08 16:05:30 70% of variants processed... 2024-07-08 16:05:31 80% of variants processed... 2024-07-08 16:05:31 90% of variants processed... 2024-07-08 16:05:31 100% of variants processed. 2024-07-08 16:05:31 Calling initial genotypes using pair-HMM realignment... 2024-07-08 16:05:31 Iteratively assembling haplotypes and refining genotypes... 2024-07-08 16:05:31 Round 1 of haplotype assembly... 2024-07-08 16:05:31 (Before HapCUT2) Total phased heterozygous SNVs: 123 Total likelihood (phred): 92420.53 2024-07-08 16:05:31 (After HapCUT2) Total phased heterozygous SNVs: 123 Total likelihood (phred): 38571.91 2024-07-08 16:05:31 (After Greedy) Total phased heterozygous SNVs: 123 Total likelihood (phred): 34372.78 2024-07-08 16:05:31 Round 2 of haplotype assembly... 2024-07-08 16:05:31 (Before HapCUT2) Total phased heterozygous SNVs: 99 Total likelihood (phred): 34372.78 2024-07-08 16:05:32 (After HapCUT2) Total phased heterozygous SNVs: 99 Total likelihood (phred): 34372.78 2024-07-08 16:05:32 (After Greedy) Total phased heterozygous SNVs: 99 Total likelihood (phred): 34372.78 2024-07-08 16:05:32 Printing VCF file... [M::mm_idx_gen::0.0021.79] collected minimizers [M::mm_idx_gen::0.0033.00] sorted minimizers [M::main::0.0032.99] loaded/built the index for 1 target sequence(s) [M::mm_mapopt_update::0.0032.84] mid_occ = 10 [M::mm_idx_stat] kmer size: 19; skip: 10; is_hpc: 1; #seq: 1 [M::mm_idx_stat::0.0032.78] distinct minimizers: 1216 (99.84% are singletons); average occurrences: 1.004; average spacing: 7.764; total length: 9480 [M::worker_pipeline::0.8882.90] mapped 668 sequences [M::main] Version: 2.22-r1101 [M::main] CMD: minimap2 -t 5 -x map-pb -a /home/aiakupo/SpecHLA/script/../db//HLA/HLA_DQB1/HLA_DQB1.fa /home/aiakupo/SpecHLA/output/DNA01//DQB1.long_read.fq.gz [M::main] Real time: 0.889 sec; CPU: 2.572 sec; Peak RSS: 0.096 GB downsample ratio is 1 for DQB1 Mean depth {'HLA_DQB1': 95}
2024-07-08 16:05:33 Min read coverage set to 2. 2024-07-08 16:05:33 Max read coverage set to 100000. 2024-07-08 16:05:33 Estimating alignment parameters... 2024-07-08 16:05:33 Done estimating alignment parameters.
Transition Probabilities: match -> match: 0.993 match -> insertion: 0.003 match -> deletion: 0.004 deletion -> match: 0.406 deletion -> deletion: 0.594 insertion -> match: 0.297 insertion -> insertion: 0.703 Emission Probabilities: match (equal): 0.950 match (not equal): 0.017 insertion: 1.000 deletion: 1.000
2024-07-08 16:05:33 Calling potential SNVs using pileup... 2024-07-08 16:05:33 547 potential variants identified. 2024-07-08 16:05:33 Generating haplotype fragments from reads... 2024-07-08 16:05:34 10% of variants processed... 2024-07-08 16:05:34 20% of variants processed... 2024-07-08 16:05:34 30% of variants processed... 2024-07-08 16:05:34 40% of variants processed... 2024-07-08 16:05:34 50% of variants processed... 2024-07-08 16:05:34 60% of variants processed... 2024-07-08 16:05:34 70% of variants processed... 2024-07-08 16:05:34 80% of variants processed... 2024-07-08 16:05:34 90% of variants processed... 2024-07-08 16:05:34 100% of variants processed. 2024-07-08 16:05:34 Calling initial genotypes using pair-HMM realignment... 2024-07-08 16:05:34 Iteratively assembling haplotypes and refining genotypes... 2024-07-08 16:05:34 Round 1 of haplotype assembly... 2024-07-08 16:05:34 (Before HapCUT2) Total phased heterozygous SNVs: 23 Total likelihood (phred): 757167.63 2024-07-08 16:05:34 (After HapCUT2) Total phased heterozygous SNVs: 23 Total likelihood (phred): 36533.11 2024-07-08 16:05:34 (After Greedy) Total phased heterozygous SNVs: 23 Total likelihood (phred): 35002.70 2024-07-08 16:05:34 Round 2 of haplotype assembly... 2024-07-08 16:05:34 (Before HapCUT2) Total phased heterozygous SNVs: 5 Total likelihood (phred): 35002.70 2024-07-08 16:05:34 (After HapCUT2) Total phased heterozygous SNVs: 5 Total likelihood (phred): 35002.70 2024-07-08 16:05:34 (After Greedy) Total phased heterozygous SNVs: 5 Total likelihood (phred): 35002.70 2024-07-08 16:05:34 Printing VCF file... [M::mm_idx_gen::0.0021.70] collected minimizers [M::mm_idx_gen::0.0032.97] sorted minimizers [M::main::0.0032.95] loaded/built the index for 1 target sequence(s) [M::mm_mapopt_update::0.0042.81] mid_occ = 10 [M::mm_idx_stat] kmer size: 19; skip: 10; is_hpc: 1; #seq: 1 [M::mm_idx_stat::0.0042.75] distinct minimizers: 1692 (99.82% are singletons); average occurrences: 1.004; average spacing: 7.786; total length: 13229 [M::worker_pipeline::5.7510.83] mapped 9477 sequences [M::main] Version: 2.22-r1101 [M::main] CMD: minimap2 -t 5 -x map-pb -a /home/aiakupo/SpecHLA/script/../db//HLA/HLA_DRB1/HLA_DRB1.fa /home/aiakupo/SpecHLA/output/DNA01//DRB1.long_read.fq.gz [M::main] Real time: 5.752 sec; CPU: 4.776 sec; Peak RSS: 0.245 GB downsample ratio is 1 for DRB1 Mean depth {'HLA_DRB1': 45}
2024-07-08 16:05:45 Min read coverage set to 2. 2024-07-08 16:05:45 Max read coverage set to 100000. 2024-07-08 16:05:45 Estimating alignment parameters... 2024-07-08 16:05:45 Done estimating alignment parameters.
Transition Probabilities: match -> match: 0.995 match -> insertion: 0.003 match -> deletion: 0.002 deletion -> match: 0.625 deletion -> deletion: 0.375 insertion -> match: 0.577 insertion -> insertion: 0.423 Emission Probabilities: match (equal): 0.972 match (not equal): 0.009 insertion: 1.000 deletion: 1.000
2024-07-08 16:05:45 Calling potential SNVs using pileup... 2024-07-08 16:05:45 592 potential variants identified. 2024-07-08 16:05:45 Generating haplotype fragments from reads... 2024-07-08 16:05:46 10% of variants processed... 2024-07-08 16:05:46 20% of variants processed... 2024-07-08 16:05:46 30% of variants processed... 2024-07-08 16:05:46 40% of variants processed... 2024-07-08 16:05:46 50% of variants processed... 2024-07-08 16:05:46 60% of variants processed... 2024-07-08 16:05:46 70% of variants processed... 2024-07-08 16:05:46 80% of variants processed... 2024-07-08 16:05:46 90% of variants processed... 2024-07-08 16:05:46 100% of variants processed. 2024-07-08 16:05:46 Calling initial genotypes using pair-HMM realignment... 2024-07-08 16:05:46 Iteratively assembling haplotypes and refining genotypes... 2024-07-08 16:05:46 Round 1 of haplotype assembly... 2024-07-08 16:05:46 (Before HapCUT2) Total phased heterozygous SNVs: 216 Total likelihood (phred): 224750.42 2024-07-08 16:05:48 (After HapCUT2) Total phased heterozygous SNVs: 216 Total likelihood (phred): 61012.72 2024-07-08 16:05:48 (After Greedy) Total phased heterozygous SNVs: 216 Total likelihood (phred): 44938.21 2024-07-08 16:05:48 Round 2 of haplotype assembly... 2024-07-08 16:05:48 (Before HapCUT2) Total phased heterozygous SNVs: 327 Total likelihood (phred): 44938.21 2024-07-08 16:05:51 (After HapCUT2) Total phased heterozygous SNVs: 327 Total likelihood (phred): 47655.22 2024-07-08 16:05:51 (After Greedy) Total phased heterozygous SNVs: 327 Total likelihood (phred): 44208.28 2024-07-08 16:05:51 Printing VCF file... Sequence is reconstructed, start annotation... parameter: sample:DNA01 dir:/home/aiakupo/SpecHLA/output/DNA01/ pop:Unknown wxs:tgs G_nom:0 No such file or directory
version: IPD-IMGT/HLA 3.38.0
Sample HLA_A_1 HLA_A_2 HLA_B_1 HLA_B_2 HLA_C_1 HLA_C_2 HLA_DPA1_1 HLA_DPA1_2 HLA_DPB1_1 HLA_DPB1_2 HLA_DQA1_1 HLA_DQA1_2 HLA_DQB1_1 HLA_DQB1_2 HLA_DRB1_1 HLA_DRB1_2 optimize typing results by balancing alignment length and identity The refined typing results is in /home/aiakupo/SpecHLA/output/DNA01//hlala.like.results.txt Finished.
— Reply to this email directly, view it on GitHub https://github.com/deepomicslab/SpecHLA/issues/29#issuecomment-2215147221, or unsubscribe https://github.com/notifications/unsubscribe-auth/ALS7DWX5TWCLKFWXTJFR57LZLLWSXAVCNFSM6AAAAABHQGSAT6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMJVGE2DOMRSGE . You are receiving this because you commented.Message ID: @.***>
Hi Shuai,
Thanks for your patience, it worked! The problem was installing dependencies... maybe you could somehow specify what has to be installed exactly just for the "long_read_typing.py" script. At some point I had an error I did not have "bgzip" and "tabix"... it was not obvious to me as I specifically installed conda for this analysis for the first time. Sorry! Now if I have other questions, I will open a new issue :) grateful for your support!!
So nice to hear it works. Also, it is a good idea to use a specific env for long_read_typing.py
. I will add this. Thanks for the advice.
Hi, it would be great if this package could be compatible with Guix, as this can help to make the work more reproducible.. are you planning this? because we use Guix, it would take too much effort to try to contain the package and its dependencies in the conda environment. Maybe I am missing something :(