deepomicslab / SpecHLA

SpecHLA reconstructs entire diploid sequences of HLA genes and infers LOH events. It supports HLA-A, -B, -C, -DPA1, -DPB1, -DQA1, -DQB1, and -DRB1 genes. Also, it supports both short- and long-read data.
MIT License
36 stars 9 forks source link

complaining about dependencies / empty output files #29

Closed alisamatisse closed 5 months ago

alisamatisse commented 5 months ago

Hi, it would be great if this package could be compatible with Guix, as this can help to make the work more reproducible.. are you planning this? because we use Guix, it would take too much effort to try to contain the package and its dependencies in the conda environment. Maybe I am missing something :(

wshuai294 commented 5 months ago

Hello, thanks for considering our method. We have provided the conda environment file which can be used to construct the env. We assume it is suitable for Linux systems. Is it not applicable to Guix? What error did you meet in running it?

alisamatisse commented 5 months ago

It is not compatible with the how our cluster paths are organized, but I guess there is not much that can be done from your side. Could it be executed independently from the conda environment? Really would like to try using your tool. Maybe will run it locally..

wshuai294 commented 5 months ago

Hello. I think it could be possible to use it without the conda environment. However, you should install the dependencies yourself. What functions do you want to use? If you only want to handle long-read data, there will be no need to install it. You can directly use the Python script. If you handle Illumina reads, it will be more complex. But I think it is also possible.

alisamatisse commented 5 months ago

Hello Shuai, that it is very promising, thank you. I am now trying to make it work...

alisamatisse commented 5 months ago

Terribly sorry for all the stupid questions, I tried to run the demo files

python3 script/long_read_typing.py -j 5 -n HG00733 -r HG00733_pacbio.subsample.fastq.gz -o output

but have this

[M::mm_idx_gen::0.568*0.84] collected minimizers
[M::mm_idx_gen::0.710*1.06] sorted minimizers
[M::main::0.713*1.06] loaded/built the index for 1591 target sequence(s)
[M::mm_mapopt_update::0.714*1.06] mid_occ = 936
[M::mm_idx_stat] kmer size: 19; skip: 10; is_hpc: 1; #seq: 1591
[M::mm_idx_stat::0.715*1.06] distinct minimizers: 49986 (20.90% are singletons); average occurrences: 39.684; average spacing: 7.725; total length: 15322978
[M::worker_pipeline::462.895*2.00] mapped 851 sequences
[M::main] Version: 2.24-r1122
[M::main] CMD: minimap2 -t 5 -x map-pb -p 0.1 -N 100000 -a /fast/AG_Lee/Alisa/HLA/PacBioSeq_2024/20240325mail_SEQS/SpecHLA/script/../db/ref/hla_gen.format.filter.extend.DRB.no26789.fasta HG00733_pacbio.subsample.fastq.gz
[M::main] Real time: 462.901 sec; CPU: 923.485 sec; Peak RSS: 0.938 GB
alignment done.
reads-binning done.
[M::mm_idx_gen::0.015*0.80] collected minimizers
[M::mm_idx_gen::0.017*0.88] sorted minimizers
[M::main::0.017*0.88] loaded/built the index for 1 target sequence(s)
[M::mm_mapopt_update::0.018*0.88] mid_occ = 10
[M::mm_idx_stat] kmer size: 19; skip: 10; is_hpc: 1; #seq: 1
[M::mm_idx_stat::0.018*0.88] distinct minimizers: 706 (100.00% are singletons); average occurrences: 1.000; average spacing: 7.795; total length: 5503
[M::worker_pipeline::0.034*1.15] mapped 6 sequences
[M::main] Version: 2.24-r1122
[M::main] CMD: minimap2 -t 5 -x map-pb -a /fast/AG_Lee/Alisa/HLA/PacBioSeq_2024/20240325mail_SEQS/SpecHLA/script/../db//HLA/HLA_A/HLA_A.fa output/HG00733//A.long_read.fq.gz
[M::main] Real time: 0.036 sec; CPU: 0.040 sec; Peak RSS: 0.011 GB
  File "/fast/AG_Lee/Alisa/HLA/PacBioSeq_2024/20240325mail_SEQS/SpecHLA/script/mask_low_depth_region.py", line 113
    print (gene, mask[0]-1, mask[1]-1, file = f)
                                            ^
SyntaxError: invalid syntax
cp: cannot stat ‘output/HG00733//low_depth.bed’: No such file or directory

Sorry again. I will understand if you decide not to reply, haha.

wshuai294 commented 5 months ago

Hello, cause it uses python instead of python3 in the 215-th line. I have edited this. Please update to the latest commit. I believe it will work now.

wshuai294 commented 4 months ago

Hi, It seems to work well except for the annotation step. Is the vcf files empty? Is there a sequence in hla.allele..HLA_.fasta? If both not, I guess the reason is the database. Have you run the bash index.sh? You can comment the software installation in index.sh, and run it to construct the database only. Or you can run it on a local PC, and upload the database to the server. Please tell me if you have further issue.

Best, Shuai

On Fri, Jun 14, 2024 at 9:24 PM Alisa @.***> wrote:

Hi Shuai,

I keep trying to make the tool work! Sorry for the dumb questions again. I moved now to Ubuntu virtual machine where I can install all the dependencies myself. I thought maybe I am missing something.. but not so obvious what... I have empty output files in the end. The code output:

`(base) @.**:~/SpecHLA/script$ python3 long_read_typing.py -r /fast/AG_Lee/Alisa/HLA/0000004436/outputs/fastx_files/m64316_240319_133624.bc2073--bc2073.hifi_reads.fastq.gz -n D NA01 -o /home/aiakupo/SpecHLA/output [M::mm_idx_gen::0.466 1.00] collected minimizers [M::mm_idx_gen::0.5161.38] sorted minimizers [M::main::0.516 1.38] loaded/built the index for 1591 target sequence(s) [M::mm_mapopt_update::0.5191.38] mid_occ = 936 [M::mm_idx_stat] kmer size: 19; skip: 10; is_hpc: 1; #seq: 1591 [M::mm_idx_stat::0.520 1.38] distinct minimizers: 49986 (20.90% are singletons); average occurrences: 39.684; average spacing: 7.725; total length: 15322978 [M::worker_pipeline::2764.5174.99] mapped 87302 sequences [M::worker_pipeline::5416.019 5.00] mapped 90496 sequences [M::worker_pipeline::6441.3304.98] mapped 34247 sequences [M::main] Version: 2.22-r1101 [M::main] CMD: minimap2 -t 5 -x map-pb -p 0.1 -N 100000 -a /home/aiakupo/SpecHLA/script/../db//ref/hla_gen.format.filter.extend.DRB.no26789.fasta /fast/AG_Lee/Alisa/HLA/0000004436/outputs/fa stx_files/m64316_240319_133624.bc2073--bc2073.hifi_reads.fastq.gz [M::main] Real time: 6441.414 sec; CPU: 32064.015 sec; Peak RSS: 4.391 GB alignment done. reads-binning done. [M::mm_idx_gen::0.002 1.63] collected minimizers [M::mm_idx_gen::0.0032.80] sorted minimizers [M::main::0.003 2.79] loaded/built the index for 1 target sequence(s) [M::mm_mapopt_update::0.0042.70] mid_occ = 10 [M::mm_idx_stat] kmer size: 19; skip: 10; is_hpc: 1; #seq: 1 [M::mm_idx_stat::0.0042.66] distinct minimizers: 706 (100.00% are singletons); average occurrences: 1.000; average spacing: 7.795; total length: 5503 [M::main] Version: 2.22-r1101 [M::main] CMD: minimap2 -t 5 -x map-pb -a /home/aiakupo/SpecHLA/script/../db//HLA/HLA_A/HLA_A.fa /home/aiakupo/SpecHLA/output/DNA01//A.long_read.fq.gz [M::main] Real time: 0.004 sec; CPU: 0.010 sec; Peak RSS: 0.003 GB downsample ratio is 1 for A Mean depth {'HLA_A': 0}

2024-06-14 13:19:48 Min read coverage set to 2. 2024-06-14 13:19:48 Max read coverage set to 100000. 2024-06-14 13:19:48 Estimating alignment parameters... 2024-06-14 13:19:48 Done estimating alignment parameters.

            Transition Probabilities:
            match -> match:          0.333
            match -> insertion:      0.333
            match -> deletion:       0.333
            deletion -> match:       0.500
            deletion -> deletion:    0.500
            insertion -> match:      0.500
            insertion -> insertion:  0.500

            Emission Probabilities:
            match (equal):           0.500
            match (not equal):       0.167
            insertion:               1.000
            deletion:                1.000

2024-06-14 13:19:48 Calling potential SNVs using pileup... 2024-06-14 13:19:48 0 potential variants identified. No candidate variants identified, printing empty VCF file... [M::mm_idx_gen::0.002 1.78] collected minimizers [M::mm_idx_gen::0.0033.24] sorted minimizers [M::main::0.003 3.22] loaded/built the index for 1 target sequence(s) [M::mm_mapopt_update::0.0033.11] mid_occ = 10 [M::mm_idx_stat] kmer size: 19; skip: 10; is_hpc: 1; #seq: 1 [M::mm_idx_stat::0.003*3.05] distinct minimizers: 788 (100.00% are singletons); average occurrences: 1.000; average spacing: 7.717; total length: 6081 [M::main] Version: 2.22-r1101 [M::main] CMD: minimap2 -t 5 -x map-pb -a /home/aiakupo/SpecHLA/script/../db//HLA/HLA_B/HLA_B.fa /home/aiakupo/SpecHLA/output/DNA01//B.long_read.fq.gz [M::main] Real time: 0.004 sec; CPU: 0.011 sec; Peak RSS: 0.003 GB downsample ratio is 1 for B Mean depth {'HLA_B': 0}

2024-06-14 13:19:48 Min read coverage set to 2. 2024-06-14 13:19:48 Max read coverage set to 100000. 2024-06-14 13:19:48 Estimating alignment parameters... 2024-06-14 13:19:48 Done estimating alignment parameters.

            Transition Probabilities:
            match -> match:          0.333
            match -> insertion:      0.333
            match -> deletion:       0.333
            deletion -> match:       0.500
            deletion -> deletion:    0.500
            insertion -> match:      0.500
            insertion -> insertion:  0.500

            Emission Probabilities:
            match (equal):           0.500
            match (not equal):       0.167
            insertion:               1.000
            deletion:                1.000

2024-06-14 13:19:48 Calling potential SNVs using pileup... 2024-06-14 13:19:48 0 potential variants identified. No candidate variants identified, printing empty VCF file... [M::mm_idx_gen::0.002 1.79] collected minimizers [M::mm_idx_gen::0.0033.12] sorted minimizers [M::main::0.003 3.10] loaded/built the index for 1 target sequence(s) [M::mm_mapopt_update::0.0033.00] mid_occ = 10 [M::mm_idx_stat] kmer size: 19; skip: 10; is_hpc: 1; #seq: 1 [M::mm_idx_stat::0.003*2.94] distinct minimizers: 826 (100.00% are singletons); average occurrences: 1.000; average spacing: 7.632; total length: 6304 [M::main] Version: 2.22-r1101 [M::main] CMD: minimap2 -t 5 -x map-pb -a /home/aiakupo/SpecHLA/script/../db//HLA/HLA_C/HLA_C.fa /home/aiakupo/SpecHLA/output/DNA01//C.long_read.fq.gz [M::main] Real time: 0.004 sec; CPU: 0.010 sec; Peak RSS: 0.003 GB downsample ratio is 1 for C Mean depth {'HLA_C': 0}

2024-06-14 13:19:48 Min read coverage set to 2. 2024-06-14 13:19:48 Max read coverage set to 100000. 2024-06-14 13:19:48 Estimating alignment parameters... 2024-06-14 13:19:48 Done estimating alignment parameters.

            Transition Probabilities:
            match -> match:          0.333
            match -> insertion:      0.333
            match -> deletion:       0.333
            deletion -> match:       0.500
            deletion -> deletion:    0.500
            insertion -> match:      0.500
            insertion -> insertion:  0.500

            Emission Probabilities:
            match (equal):           0.500
            match (not equal):       0.167
            insertion:               1.000
            deletion:                1.000

2024-06-14 13:19:48 Calling potential SNVs using pileup... 2024-06-14 13:19:48 0 potential variants identified. No candidate variants identified, printing empty VCF file... [M::mm_idx_gen::0.002 1.62] collected minimizers [M::mm_idx_gen::0.0032.60] sorted minimizers [M::main::0.003 2.58] loaded/built the index for 1 target sequence(s) [M::mm_mapopt_update::0.0032.49] mid_occ = 10 [M::mm_idx_stat] kmer size: 19; skip: 10; is_hpc: 1; #seq: 1 [M::mm_idx_stat::0.003 2.44] distinct minimizers: 1488 (100.00% are singletons); average occurrences: 1.000; average spacing: 7.913; total length: 11775 [M::worker_pipeline::0.2672.61] mapped 231 sequences [M::main] Version: 2.22-r1101 [M::main] CMD: minimap2 -t 5 -x map-pb -a /home/aiakupo/SpecHLA/script/../db//HLA/HLA_DPA1/HLA_DPA1.fa /home/aiakupo/SpecHLA/output/DNA01//DPA1.long_read.fq.gz [M::main] Real time: 0.268 sec; CPU: 0.699 sec; Peak RSS: 0.071 GB downsample ratio is 1 for DPA1 Mean depth {'HLA_DPA1': 0}

2024-06-14 13:19:49 Min read coverage set to 2. 2024-06-14 13:19:49 Max read coverage set to 100000. 2024-06-14 13:19:49 Estimating alignment parameters... 2024-06-14 13:19:49 Done estimating alignment parameters.

            Transition Probabilities:
            match -> match:          0.333
            match -> insertion:      0.333
            match -> deletion:       0.333
            deletion -> match:       0.500
            deletion -> deletion:    0.500
            insertion -> match:      0.500
            insertion -> insertion:  0.500

            Emission Probabilities:
            match (equal):           0.500
            match (not equal):       0.167
            insertion:               1.000
            deletion:                1.000

2024-06-14 13:19:49 Calling potential SNVs using pileup... 2024-06-14 13:19:49 0 potential variants identified. No candidate variants identified, printing empty VCF file... [M::mm_idx_gen::0.002 1.63] collected minimizers [M::mm_idx_gen::0.0032.96] sorted minimizers [M::main::0.003 2.94] loaded/built the index for 1 target sequence(s) [M::mm_mapopt_update::0.0042.83] mid_occ = 10 [M::mm_idx_stat] kmer size: 19; skip: 10; is_hpc: 1; #seq: 1 [M::mm_idx_stat::0.004 2.76] distinct minimizers: 1730 (99.94% are singletons); average occurrences: 1.001; average spacing: 7.780; total length: 13468 [M::worker_pipeline::7.1402.84] mapped 5835 sequences [M::main] Version: 2.22-r1101 [M::main] CMD: minimap2 -t 5 -x map-pb -a /home/aiakupo/SpecHLA/script/../db//HLA/HLA_DPB1/HLA_DPB1.fa /home/aiakupo/SpecHLA/output/DNA01//DPB1.long_read.fq.gz [M::main] Real time: 7.141 sec; CPU: 20.293 sec; Peak RSS: 0.147 GB downsample ratio is 1 for DPB1 Mean depth {'HLA_DPB1': 182}

2024-06-14 13:20:00 Min read coverage set to 2. 2024-06-14 13:20:00 Max read coverage set to 100000. 2024-06-14 13:20:00 Estimating alignment parameters... 2024-06-14 13:20:00 Done estimating alignment parameters.

            Transition Probabilities:
            match -> match:          0.990
            match -> insertion:      0.006
            match -> deletion:       0.004
            deletion -> match:       0.672
            deletion -> deletion:    0.328
            insertion -> match:      0.763
            insertion -> insertion:  0.237

            Emission Probabilities:
            match (equal):           0.925
            match (not equal):       0.025
            insertion:               1.000
            deletion:                1.000

2024-06-14 13:20:00 Calling potential SNVs using pileup... 2024-06-14 13:20:00 123 potential variants identified. 2024-06-14 13:20:00 Generating haplotype fragments from reads... 2024-06-14 13:20:05 10% of variants processed... 2024-06-14 13:20:05 20% of variants processed... 2024-06-14 13:20:05 30% of variants processed... 2024-06-14 13:20:06 40% of variants processed... 2024-06-14 13:20:06 100% of variants processed. 2024-06-14 13:20:06 Calling initial genotypes using pair-HMM realignment... 2024-06-14 13:20:06 Iteratively assembling haplotypes and refining genotypes... 2024-06-14 13:20:06 Round 1 of haplotype assembly... 2024-06-14 13:20:06 (Before HapCUT2) Total phased heterozygous SNVs: 61 Total likelihood (phred): 1012374.57 2024-06-14 13:20:08 (After HapCUT2) Total phased heterozygous SNVs: 61 Total likelihood (phred): 322425.24 2024-06-14 13:20:08 (After Greedy) Total phased heterozygous SNVs: 61 Total likelihood (phred): 291216.51 2024-06-14 13:20:08 Round 2 of haplotype assembly... 2024-06-14 13:20:08 (Before HapCUT2) Total phased heterozygous SNVs: 46 Total likelihood (phred): 291216.51 2024-06-14 13:20:09 (After HapCUT2) Total phased heterozygous SNVs: 46 Total likelihood (phred): 291900.35 2024-06-14 13:20:09 (After Greedy) Total phased heterozygous SNVs: 46 Total likelihood (phred): 291216.51 2024-06-14 13:20:09 Printing VCF file... [M::mm_idx_gen::0.002 1.83] collected minimizers [M::mm_idx_gen::0.0033.03] sorted minimizers [M::main::0.003 3.02] loaded/built the index for 1 target sequence(s) [M::mm_mapopt_update::0.0032.88] mid_occ = 10 [M::mm_idx_stat] kmer size: 19; skip: 10; is_hpc: 1; #seq: 1 [M::mm_idx_stat::0.003 2.82] distinct minimizers: 1059 (99.91% are singletons); average occurrences: 1.008; average spacing: 7.959; total length: 8492 [M::worker_pipeline::6.3873.58] mapped 3343 sequences [M::main] Version: 2.22-r1101 [M::main] CMD: minimap2 -t 5 -x map-pb -a /home/aiakupo/SpecHLA/script/../db//HLA/HLA_DQA1/HLA_DQA1.fa /home/aiakupo/SpecHLA/output/DNA01//DQA1.long_read.fq.gz [M::main] Real time: 6.387 sec; CPU: 22.863 sec; Peak RSS: 0.153 GB downsample ratio is 1 for DQA1 Mean depth {'HLA_DQA1': 88}

2024-06-14 13:20:18 Min read coverage set to 2. 2024-06-14 13:20:18 Max read coverage set to 100000. 2024-06-14 13:20:18 Estimating alignment parameters... 2024-06-14 13:20:18 Done estimating alignment parameters.

            Transition Probabilities:
            match -> match:          0.991
            match -> insertion:      0.005
            match -> deletion:       0.004
            deletion -> match:       0.514
            deletion -> deletion:    0.486
            insertion -> match:      0.556
            insertion -> insertion:  0.444

            Emission Probabilities:
            match (equal):           0.975
            match (not equal):       0.008
            insertion:               1.000
            deletion:                1.000

2024-06-14 13:20:18 Calling potential SNVs using pileup... 2024-06-14 13:20:18 183 potential variants identified. 2024-06-14 13:20:18 Generating haplotype fragments from reads... 2024-06-14 13:20:18 10% of variants processed... 2024-06-14 13:20:18 20% of variants processed... 2024-06-14 13:20:18 30% of variants processed... 2024-06-14 13:20:18 40% of variants processed... 2024-06-14 13:20:18 50% of variants processed... 2024-06-14 13:20:18 60% of variants processed... 2024-06-14 13:20:18 70% of variants processed... 2024-06-14 13:20:18 80% of variants processed... 2024-06-14 13:20:18 90% of variants processed... 2024-06-14 13:20:18 100% of variants processed. 2024-06-14 13:20:18 Calling initial genotypes using pair-HMM realignment... 2024-06-14 13:20:18 Iteratively assembling haplotypes and refining genotypes... 2024-06-14 13:20:18 Round 1 of haplotype assembly... 2024-06-14 13:20:18 (Before HapCUT2) Total phased heterozygous SNVs: 123 Total likelihood (phred): 92420.53 2024-06-14 13:20:19 (After HapCUT2) Total phased heterozygous SNVs: 123 Total likelihood (phred): 38571.91 2024-06-14 13:20:19 (After Greedy) Total phased heterozygous SNVs: 123 Total likelihood (phred): 34372.78 2024-06-14 13:20:19 Round 2 of haplotype assembly... 2024-06-14 13:20:19 (Before HapCUT2) Total phased heterozygous SNVs: 99 Total likelihood (phred): 34372.78 2024-06-14 13:20:20 (After HapCUT2) Total phased heterozygous SNVs: 99 Total likelihood (phred): 34372.78 2024-06-14 13:20:20 (After Greedy) Total phased heterozygous SNVs: 99 Total likelihood (phred): 34372.78 2024-06-14 13:20:20 Printing VCF file... [M::mm_idx_gen::0.002 1.73] collected minimizers [M::mm_idx_gen::0.0032.89] sorted minimizers [M::main::0.003 2.87] loaded/built the index for 1 target sequence(s) [M::mm_mapopt_update::0.0032.74] mid_occ = 10 [M::mm_idx_stat] kmer size: 19; skip: 10; is_hpc: 1; #seq: 1 [M::mm_idx_stat::0.003 2.69] distinct minimizers: 1216 (99.84% are singletons); average occurrences: 1.004; average spacing: 7.764; total length: 9480 [M::worker_pipeline::0.8842.89] mapped 668 sequences [M::main] Version: 2.22-r1101 [M::main] CMD: minimap2 -t 5 -x map-pb -a /home/aiakupo/SpecHLA/script/../db//HLA/HLA_DQB1/HLA_DQB1.fa /home/aiakupo/SpecHLA/output/DNA01//DQB1.long_read.fq.gz [M::main] Real time: 0.884 sec; CPU: 2.556 sec; Peak RSS: 0.096 GB downsample ratio is 1 for DQB1 Mean depth {'HLA_DQB1': 95}

2024-06-14 13:20:21 Min read coverage set to 2. 2024-06-14 13:20:21 Max read coverage set to 100000. 2024-06-14 13:20:21 Estimating alignment parameters... 2024-06-14 13:20:21 Done estimating alignment parameters.

            Transition Probabilities:
            match -> match:          0.993
            match -> insertion:      0.003
            match -> deletion:       0.004
            deletion -> match:       0.406
            deletion -> deletion:    0.594
            insertion -> match:      0.297
            insertion -> insertion:  0.703

            Emission Probabilities:
            match (equal):           0.950
            match (not equal):       0.017
            insertion:               1.000
            deletion:                1.000

2024-06-14 13:20:21 Calling potential SNVs using pileup... 2024-06-14 13:20:21 547 potential variants identified. 2024-06-14 13:20:21 Generating haplotype fragments from reads... 2024-06-14 13:20:21 10% of variants processed... 2024-06-14 13:20:22 20% of variants processed... 2024-06-14 13:20:22 30% of variants processed... 2024-06-14 13:20:22 40% of variants processed... 2024-06-14 13:20:22 50% of variants processed... 2024-06-14 13:20:22 60% of variants processed... 2024-06-14 13:20:22 70% of variants processed... 2024-06-14 13:20:22 80% of variants processed... 2024-06-14 13:20:22 90% of variants processed... 2024-06-14 13:20:22 100% of variants processed. 2024-06-14 13:20:22 Calling initial genotypes using pair-HMM realignment... 2024-06-14 13:20:22 Iteratively assembling haplotypes and refining genotypes... 2024-06-14 13:20:22 Round 1 of haplotype assembly... 2024-06-14 13:20:22 (Before HapCUT2) Total phased heterozygous SNVs: 23 Total likelihood (phred): 757167.63 2024-06-14 13:20:22 (After HapCUT2) Total phased heterozygous SNVs: 23 Total likelihood (phred): 36533.11 2024-06-14 13:20:22 (After Greedy) Total phased heterozygous SNVs: 23 Total likelihood (phred): 35002.70 2024-06-14 13:20:22 Round 2 of haplotype assembly... 2024-06-14 13:20:22 (Before HapCUT2) Total phased heterozygous SNVs: 5 Total likelihood (phred): 35002.70 2024-06-14 13:20:22 (After HapCUT2) Total phased heterozygous SNVs: 5 Total likelihood (phred): 35002.70 2024-06-14 13:20:22 (After Greedy) Total phased heterozygous SNVs: 5 Total likelihood (phred): 35002.70 2024-06-14 13:20:22 Printing VCF file... [M::mm_idx_gen::0.002 1.00] collected minimizers [M::mm_idx_gen::0.0032.78] sorted minimizers [M::main::0.003 2.76] loaded/built the index for 1 target sequence(s) [M::mm_mapopt_update::0.0042.58] mid_occ = 10 [M::mm_idx_stat] kmer size: 19; skip: 10; is_hpc: 1; #seq: 1 [M::mm_idx_stat::0.004 2.53] distinct minimizers: 1692 (99.82% are singletons); average occurrences: 1.004; average spacing: 7.786; total length: 13229 [M::worker_pipeline::5.7420.83] mapped 9477 sequences [M::main] Version: 2.22-r1101 [M::main] CMD: minimap2 -t 5 -x map-pb -a /home/aiakupo/SpecHLA/script/../db//HLA/HLA_DRB1/HLA_DRB1.fa /home/aiakupo/SpecHLA/output/DNA01//DRB1.long_read.fq.gz [M::main] Real time: 5.742 sec; CPU: 4.793 sec; Peak RSS: 0.245 GB downsample ratio is 1 for DRB1 Mean depth {'HLA_DRB1': 45}

2024-06-14 13:20:33 Min read coverage set to 2. 2024-06-14 13:20:33 Max read coverage set to 100000. 2024-06-14 13:20:33 Estimating alignment parameters... 2024-06-14 13:20:33 Done estimating alignment parameters.

            Transition Probabilities:
            match -> match:          0.995
            match -> insertion:      0.003
            match -> deletion:       0.002
            deletion -> match:       0.625
            deletion -> deletion:    0.375
            insertion -> match:      0.577
            insertion -> insertion:  0.423

            Emission Probabilities:
            match (equal):           0.972
            match (not equal):       0.009
            insertion:               1.000
            deletion:                1.000

2024-06-14 13:20:33 Calling potential SNVs using pileup... 2024-06-14 13:20:33 592 potential variants identified. 2024-06-14 13:20:33 Generating haplotype fragments from reads... 2024-06-14 13:20:34 10% of variants processed... 2024-06-14 13:20:34 20% of variants processed... 2024-06-14 13:20:34 30% of variants processed... 2024-06-14 13:20:34 40% of variants processed... 2024-06-14 13:20:34 50% of variants processed... 2024-06-14 13:20:34 60% of variants processed... 2024-06-14 13:20:34 70% of variants processed... 2024-06-14 13:20:34 80% of variants processed... 2024-06-14 13:20:34 90% of variants processed... 2024-06-14 13:20:34 100% of variants processed. 2024-06-14 13:20:34 Calling initial genotypes using pair-HMM realignment... 2024-06-14 13:20:34 Iteratively assembling haplotypes and refining genotypes... 2024-06-14 13:20:34 Round 1 of haplotype assembly... 2024-06-14 13:20:34 (Before HapCUT2) Total phased heterozygous SNVs: 216 Total likelihood (phred): 224750.42 2024-06-14 13:20:35 (After HapCUT2) Total phased heterozygous SNVs: 216 Total likelihood (phred): 61012.72 2024-06-14 13:20:35 (After Greedy) Total phased heterozygous SNVs: 216 Total likelihood (phred): 44938.21 2024-06-14 13:20:35 Round 2 of haplotype assembly... 2024-06-14 13:20:35 (Before HapCUT2) Total phased heterozygous SNVs: 327 Total likelihood (phred): 44938.21 2024-06-14 13:20:38 (After HapCUT2) Total phased heterozygous SNVs: 327 Total likelihood (phred): 47655.22 2024-06-14 13:20:38 (After Greedy) Total phased heterozygous SNVs: 327 Total likelihood (phred): 44208.28 2024-06-14 13:20:38 Printing VCF file...

Sequence is reconstructed, start annotation... parameter: sample:DNA01 dir:/home/aiakupo/SpecHLA/output/DNA01/ pop:Unknown wxs:tgs G_nom:0 No such file or directory version: IPD-IMGT/HLA 3.38.0

Sample HLA_A_1 HLA_A_2 HLA_B_1 HLA_B_2 HLA_C_1 HLA_C_2 HLA_DPA1_1 HLA_DPA1_2 HLA_DPB1_1 HLA_DPB1_2 HLA_DQA1_1 HLA_DQA1_2 HLA_DQB1_1 HLA_DQB1_2 HLA_DR B1_1 HLA_DRB1_2 optimize typing results by balancing alignment length and identity The refined typing results is in /home/aiakupo/SpecHLA/output/DNA01//hlala.like.results.txt Finished. `

The part where it breaks (I think) is highlighted in bold

— Reply to this email directly, view it on GitHub https://github.com/deepomicslab/SpecHLA/issues/29#issuecomment-2168037236, or unsubscribe https://github.com/notifications/unsubscribe-auth/ALS7DWSMLQH27II4EVB46FTZHLVHFAVCNFSM6AAAAABHQGSAT6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNRYGAZTOMRTGY . You are receiving this because you commented.Message ID: @.***>

alisamatisse commented 3 months ago

Hi Shuai,

both vcf and hla.allele..HLA_.fasta contain information, so these steps seemed to work.. I understand why it doesn't work for HLA I: we did enrichment for a certain region of MHC II region only. Well, I thought alleles could be determined there at least but still the final output file is empty somehow :(


(base) aiakupo@sl-sci-lee1:~/SpecHLA/output/DNA01$ cat hlala.like.results.txt
Locus   Chromosome      Allele
A       1
A       2
B       1
B       2
C       1
C       2
DPA1    1
DPA1    2
DPB1    1
DPB1    2
DQA1    1
DQA1    2
DQB1    1
DQB1    2
DRB1    1
DRB1    2
(base) aiakupo@sl-sci-lee1:~/SpecHLA/output/DNA01$ cat hla.result.
hla.result.details.txt  hla.result.txt
(base) aiakupo@sl-sci-lee1:~/SpecHLA/output/DNA01$ cat hla.result.txt
# version: IPD-IMGT/HLA 3.38.0
Sample  HLA_A_1 HLA_A_2 HLA_B_1 HLA_B_2 HLA_C_1 HLA_C_2 HLA_DPA1_1      HLA_DPA1_2      HLA_DPB1_1      HLA_DPB1_2      HLA_DQA1_1      HLA_DQA1_2      HLA_DQB1_1      HLA_DQB1_2      HLA_DRB1_1     HLA_DRB1_2
(base) aiakupo@sl-sci-lee1:~/SpecHLA/output/DNA01$ cat hla.result.details.txt
# version: IPD-IMGT/HLA 3.38.0
Gene    G_best  allele  details:allele;Score;Caucasian;Black;Asian
aiakupo@sl-sci-lee1:~/SpecHLA/script$ python3 long_read_typing.py -r /fast/AG_Lee/Alisa/HLA/0000004436/outputs/fastx_files/m64316_240319_133624.bc2073--bc2073.hifi_reads.fastq.gz -n DNA01 -o /home/aiakupo/SpecHLA/output
[M::mm_idx_gen::0.461*1.00] collected minimizers
[M::mm_idx_gen::0.516*1.41] sorted minimizers
[M::main::0.517*1.41] loaded/built the index for 1591 target sequence(s)
[M::mm_mapopt_update::0.519*1.41] mid_occ = 936
[M::mm_idx_stat] kmer size: 19; skip: 10; is_hpc: 1; #seq: 1591
[M::mm_idx_stat::0.520*1.41] distinct minimizers: 49986 (20.90% are singletons); average occurrences: 39.684; average spacing: 7.725; total length: 15322978
[M::worker_pipeline::2785.920*4.99] mapped 87302 sequences
[M::worker_pipeline::5446.614*5.00] mapped 90496 sequences
[M::worker_pipeline::6477.379*4.98] mapped 34247 sequences
[M::main] Version: 2.22-r1101
[M::main] CMD: minimap2 -t 5 -x map-pb -p 0.1 -N 100000 -a /home/aiakupo/SpecHLA/script/../db//ref/hla_gen.format.filter.extend.DRB.no26789.fasta /fast/AG_Lee/Alisa/HLA/0000004436/outputs/fastx_files/m64316_240319_133624.bc2073--bc2073.hifi_reads.fastq.gz
[M::main] Real time: 6477.463 sec; CPU: 32256.322 sec; Peak RSS: 4.391 GB
alignment done.
reads-binning done.
[M::mm_idx_gen::0.002*2.07] collected minimizers
[M::mm_idx_gen::0.003*3.10] sorted minimizers
[M::main::0.003*3.08] loaded/built the index for 1 target sequence(s)
[M::mm_mapopt_update::0.003*2.96] mid_occ = 10
[M::mm_idx_stat] kmer size: 19; skip: 10; is_hpc: 1; #seq: 1
[M::mm_idx_stat::0.003*2.91] distinct minimizers: 706 (100.00% are singletons); average occurrences: 1.000; average spacing: 7.795; total length: 5503
[M::main] Version: 2.22-r1101
[M::main] CMD: minimap2 -t 5 -x map-pb -a /home/aiakupo/SpecHLA/script/../db//HLA/HLA_A/HLA_A.fa /home/aiakupo/SpecHLA/output/DNA01//A.long_read.fq.gz
[M::main] Real time: 0.004 sec; CPU: 0.010 sec; Peak RSS: 0.003 GB
downsample ratio is 1 for A
Mean depth {'HLA_A': 0}

2024-07-08 16:05:00 Min read coverage set to 2.
2024-07-08 16:05:00 Max read coverage set to 100000.
2024-07-08 16:05:00 Estimating alignment parameters...
2024-07-08 16:05:00 Done estimating alignment parameters.

                    Transition Probabilities:
                    match -> match:          0.333
                    match -> insertion:      0.333
                    match -> deletion:       0.333
                    deletion -> match:       0.500
                    deletion -> deletion:    0.500
                    insertion -> match:      0.500
                    insertion -> insertion:  0.500

                    Emission Probabilities:
                    match (equal):           0.500
                    match (not equal):       0.167
                    insertion:               1.000
                    deletion:                1.000

2024-07-08 16:05:00 Calling potential SNVs using pileup...
2024-07-08 16:05:00 0 potential variants identified.
No candidate variants identified, printing empty VCF file...
[M::mm_idx_gen::0.002*1.89] collected minimizers
[M::mm_idx_gen::0.003*3.13] sorted minimizers
[M::main::0.003*3.11] loaded/built the index for 1 target sequence(s)
[M::mm_mapopt_update::0.003*3.00] mid_occ = 10
[M::mm_idx_stat] kmer size: 19; skip: 10; is_hpc: 1; #seq: 1
[M::mm_idx_stat::0.003*2.95] distinct minimizers: 788 (100.00% are singletons); average occurrences: 1.000; average spacing: 7.717; total length: 6081
[M::main] Version: 2.22-r1101
[M::main] CMD: minimap2 -t 5 -x map-pb -a /home/aiakupo/SpecHLA/script/../db//HLA/HLA_B/HLA_B.fa /home/aiakupo/SpecHLA/output/DNA01//B.long_read.fq.gz
[M::main] Real time: 0.004 sec; CPU: 0.010 sec; Peak RSS: 0.003 GB
downsample ratio is 1 for B
Mean depth {'HLA_B': 0}

2024-07-08 16:05:00 Min read coverage set to 2.
2024-07-08 16:05:00 Max read coverage set to 100000.
2024-07-08 16:05:00 Estimating alignment parameters...
2024-07-08 16:05:00 Done estimating alignment parameters.

                    Transition Probabilities:
                    match -> match:          0.333
                    match -> insertion:      0.333
                    match -> deletion:       0.333
                    deletion -> match:       0.500
                    deletion -> deletion:    0.500
                    insertion -> match:      0.500
                    insertion -> insertion:  0.500

                    Emission Probabilities:
                    match (equal):           0.500
                    match (not equal):       0.167
                    insertion:               1.000
                    deletion:                1.000

2024-07-08 16:05:00 Calling potential SNVs using pileup...
2024-07-08 16:05:00 0 potential variants identified.
No candidate variants identified, printing empty VCF file...
[M::mm_idx_gen::0.002*1.77] collected minimizers
[M::mm_idx_gen::0.003*2.86] sorted minimizers
[M::main::0.003*2.84] loaded/built the index for 1 target sequence(s)
[M::mm_mapopt_update::0.003*2.71] mid_occ = 10
[M::mm_idx_stat] kmer size: 19; skip: 10; is_hpc: 1; #seq: 1
[M::mm_idx_stat::0.003*2.67] distinct minimizers: 826 (100.00% are singletons); average occurrences: 1.000; average spacing: 7.632; total length: 6304
[M::main] Version: 2.22-r1101
[M::main] CMD: minimap2 -t 5 -x map-pb -a /home/aiakupo/SpecHLA/script/../db//HLA/HLA_C/HLA_C.fa /home/aiakupo/SpecHLA/output/DNA01//C.long_read.fq.gz
[M::main] Real time: 0.004 sec; CPU: 0.009 sec; Peak RSS: 0.003 GB
downsample ratio is 1 for C
Mean depth {'HLA_C': 0}

2024-07-08 16:05:00 Min read coverage set to 2.
2024-07-08 16:05:00 Max read coverage set to 100000.
2024-07-08 16:05:00 Estimating alignment parameters...
2024-07-08 16:05:00 Done estimating alignment parameters.

                    Transition Probabilities:
                    match -> match:          0.333
                    match -> insertion:      0.333
                    match -> deletion:       0.333
                    deletion -> match:       0.500
                    deletion -> deletion:    0.500
                    insertion -> match:      0.500
                    insertion -> insertion:  0.500

                    Emission Probabilities:
                    match (equal):           0.500
                    match (not equal):       0.167
                    insertion:               1.000
                    deletion:                1.000

2024-07-08 16:05:00 Calling potential SNVs using pileup...
2024-07-08 16:05:00 0 potential variants identified.
No candidate variants identified, printing empty VCF file...
[M::mm_idx_gen::0.002*1.71] collected minimizers
[M::mm_idx_gen::0.003*2.73] sorted minimizers
[M::main::0.003*2.71] loaded/built the index for 1 target sequence(s)
[M::mm_mapopt_update::0.003*2.61] mid_occ = 10
[M::mm_idx_stat] kmer size: 19; skip: 10; is_hpc: 1; #seq: 1
[M::mm_idx_stat::0.004*2.55] distinct minimizers: 1488 (100.00% are singletons); average occurrences: 1.000; average spacing: 7.913; total length: 11775
[M::worker_pipeline::0.270*2.62] mapped 231 sequences
[M::main] Version: 2.22-r1101
[M::main] CMD: minimap2 -t 5 -x map-pb -a /home/aiakupo/SpecHLA/script/../db//HLA/HLA_DPA1/HLA_DPA1.fa /home/aiakupo/SpecHLA/output/DNA01//DPA1.long_read.fq.gz
[M::main] Real time: 0.271 sec; CPU: 0.707 sec; Peak RSS: 0.071 GB
downsample ratio is 1 for DPA1
Mean depth {'HLA_DPA1': 0}

2024-07-08 16:05:01 Min read coverage set to 2.
2024-07-08 16:05:01 Max read coverage set to 100000.
2024-07-08 16:05:01 Estimating alignment parameters...
2024-07-08 16:05:01 Done estimating alignment parameters.

                    Transition Probabilities:
                    match -> match:          0.333
                    match -> insertion:      0.333
                    match -> deletion:       0.333
                    deletion -> match:       0.500
                    deletion -> deletion:    0.500
                    insertion -> match:      0.500
                    insertion -> insertion:  0.500

                    Emission Probabilities:
                    match (equal):           0.500
                    match (not equal):       0.167
                    insertion:               1.000
                    deletion:                1.000

2024-07-08 16:05:01 Calling potential SNVs using pileup...
2024-07-08 16:05:01 0 potential variants identified.
No candidate variants identified, printing empty VCF file...
[M::mm_idx_gen::0.002*1.64] collected minimizers
[M::mm_idx_gen::0.003*2.93] sorted minimizers
[M::main::0.003*2.91] loaded/built the index for 1 target sequence(s)
[M::mm_mapopt_update::0.004*2.79] mid_occ = 10
[M::mm_idx_stat] kmer size: 19; skip: 10; is_hpc: 1; #seq: 1
[M::mm_idx_stat::0.004*2.73] distinct minimizers: 1730 (99.94% are singletons); average occurrences: 1.001; average spacing: 7.780; total length: 13468
[M::worker_pipeline::7.230*2.84] mapped 5835 sequences
[M::main] Version: 2.22-r1101
[M::main] CMD: minimap2 -t 5 -x map-pb -a /home/aiakupo/SpecHLA/script/../db//HLA/HLA_DPB1/HLA_DPB1.fa /home/aiakupo/SpecHLA/output/DNA01//DPB1.long_read.fq.gz
[M::main] Real time: 7.231 sec; CPU: 20.564 sec; Peak RSS: 0.147 GB
downsample ratio is 1 for DPB1
Mean depth {'HLA_DPB1': 182}

2024-07-08 16:05:12 Min read coverage set to 2.
2024-07-08 16:05:12 Max read coverage set to 100000.
2024-07-08 16:05:12 Estimating alignment parameters...
2024-07-08 16:05:12 Done estimating alignment parameters.

                    Transition Probabilities:
                    match -> match:          0.990
                    match -> insertion:      0.006
                    match -> deletion:       0.004
                    deletion -> match:       0.672
                    deletion -> deletion:    0.328
                    insertion -> match:      0.763
                    insertion -> insertion:  0.237

                    Emission Probabilities:
                    match (equal):           0.925
                    match (not equal):       0.025
                    insertion:               1.000
                    deletion:                1.000

2024-07-08 16:05:12 Calling potential SNVs using pileup...
2024-07-08 16:05:12 123 potential variants identified.
2024-07-08 16:05:12 Generating haplotype fragments from reads...
2024-07-08 16:05:17    10% of variants processed...
2024-07-08 16:05:17    20% of variants processed...
2024-07-08 16:05:18    30% of variants processed...
2024-07-08 16:05:18    40% of variants processed...
2024-07-08 16:05:18    100% of variants processed.
2024-07-08 16:05:18 Calling initial genotypes using pair-HMM realignment...
2024-07-08 16:05:18 Iteratively assembling haplotypes and refining genotypes...
2024-07-08 16:05:18    Round 1 of haplotype assembly...
2024-07-08 16:05:18    (Before HapCUT2) Total phased heterozygous SNVs: 61  Total likelihood (phred): 1012374.57
2024-07-08 16:05:20    (After HapCUT2)  Total phased heterozygous SNVs: 61  Total likelihood (phred): 322425.24
2024-07-08 16:05:20    (After Greedy)   Total phased heterozygous SNVs: 61  Total likelihood (phred): 291216.51
2024-07-08 16:05:20    Round 2 of haplotype assembly...
2024-07-08 16:05:20    (Before HapCUT2) Total phased heterozygous SNVs: 46  Total likelihood (phred): 291216.51
2024-07-08 16:05:21    (After HapCUT2)  Total phased heterozygous SNVs: 46  Total likelihood (phred): 291900.35
2024-07-08 16:05:21    (After Greedy)   Total phased heterozygous SNVs: 46  Total likelihood (phred): 291216.51
2024-07-08 16:05:21 Printing VCF file...
[M::mm_idx_gen::0.002*1.75] collected minimizers
[M::mm_idx_gen::0.003*3.13] sorted minimizers
[M::main::0.003*3.12] loaded/built the index for 1 target sequence(s)
[M::mm_mapopt_update::0.004*2.99] mid_occ = 10
[M::mm_idx_stat] kmer size: 19; skip: 10; is_hpc: 1; #seq: 1
[M::mm_idx_stat::0.004*2.94] distinct minimizers: 1059 (99.91% are singletons); average occurrences: 1.008; average spacing: 7.959; total length: 8492
[M::worker_pipeline::6.387*3.58] mapped 3343 sequences
[M::main] Version: 2.22-r1101
[M::main] CMD: minimap2 -t 5 -x map-pb -a /home/aiakupo/SpecHLA/script/../db//HLA/HLA_DQA1/HLA_DQA1.fa /home/aiakupo/SpecHLA/output/DNA01//DQA1.long_read.fq.gz
[M::main] Real time: 6.388 sec; CPU: 22.850 sec; Peak RSS: 0.153 GB
downsample ratio is 1 for DQA1
Mean depth {'HLA_DQA1': 88}

2024-07-08 16:05:30 Min read coverage set to 2.
2024-07-08 16:05:30 Max read coverage set to 100000.
2024-07-08 16:05:30 Estimating alignment parameters...
2024-07-08 16:05:30 Done estimating alignment parameters.

                    Transition Probabilities:
                    match -> match:          0.991
                    match -> insertion:      0.005
                    match -> deletion:       0.004
                    deletion -> match:       0.514
                    deletion -> deletion:    0.486
                    insertion -> match:      0.556
                    insertion -> insertion:  0.444

                    Emission Probabilities:
                    match (equal):           0.975
                    match (not equal):       0.008
                    insertion:               1.000
                    deletion:                1.000

2024-07-08 16:05:30 Calling potential SNVs using pileup...
2024-07-08 16:05:30 183 potential variants identified.
2024-07-08 16:05:30 Generating haplotype fragments from reads...
2024-07-08 16:05:30    10% of variants processed...
2024-07-08 16:05:30    20% of variants processed...
2024-07-08 16:05:30    30% of variants processed...
2024-07-08 16:05:30    40% of variants processed...
2024-07-08 16:05:30    50% of variants processed...
2024-07-08 16:05:30    60% of variants processed...
2024-07-08 16:05:30    70% of variants processed...
2024-07-08 16:05:31    80% of variants processed...
2024-07-08 16:05:31    90% of variants processed...
2024-07-08 16:05:31    100% of variants processed.
2024-07-08 16:05:31 Calling initial genotypes using pair-HMM realignment...
2024-07-08 16:05:31 Iteratively assembling haplotypes and refining genotypes...
2024-07-08 16:05:31    Round 1 of haplotype assembly...
2024-07-08 16:05:31    (Before HapCUT2) Total phased heterozygous SNVs: 123  Total likelihood (phred): 92420.53
2024-07-08 16:05:31    (After HapCUT2)  Total phased heterozygous SNVs: 123  Total likelihood (phred): 38571.91
2024-07-08 16:05:31    (After Greedy)   Total phased heterozygous SNVs: 123  Total likelihood (phred): 34372.78
2024-07-08 16:05:31    Round 2 of haplotype assembly...
2024-07-08 16:05:31    (Before HapCUT2) Total phased heterozygous SNVs: 99  Total likelihood (phred): 34372.78
2024-07-08 16:05:32    (After HapCUT2)  Total phased heterozygous SNVs: 99  Total likelihood (phred): 34372.78
2024-07-08 16:05:32    (After Greedy)   Total phased heterozygous SNVs: 99  Total likelihood (phred): 34372.78
2024-07-08 16:05:32 Printing VCF file...
[M::mm_idx_gen::0.002*1.79] collected minimizers
[M::mm_idx_gen::0.003*3.00] sorted minimizers
[M::main::0.003*2.99] loaded/built the index for 1 target sequence(s)
[M::mm_mapopt_update::0.003*2.84] mid_occ = 10
[M::mm_idx_stat] kmer size: 19; skip: 10; is_hpc: 1; #seq: 1
[M::mm_idx_stat::0.003*2.78] distinct minimizers: 1216 (99.84% are singletons); average occurrences: 1.004; average spacing: 7.764; total length: 9480
[M::worker_pipeline::0.888*2.90] mapped 668 sequences
[M::main] Version: 2.22-r1101
[M::main] CMD: minimap2 -t 5 -x map-pb -a /home/aiakupo/SpecHLA/script/../db//HLA/HLA_DQB1/HLA_DQB1.fa /home/aiakupo/SpecHLA/output/DNA01//DQB1.long_read.fq.gz
[M::main] Real time: 0.889 sec; CPU: 2.572 sec; Peak RSS: 0.096 GB
downsample ratio is 1 for DQB1
Mean depth {'HLA_DQB1': 95}

2024-07-08 16:05:33 Min read coverage set to 2.
2024-07-08 16:05:33 Max read coverage set to 100000.
2024-07-08 16:05:33 Estimating alignment parameters...
2024-07-08 16:05:33 Done estimating alignment parameters.

                    Transition Probabilities:
                    match -> match:          0.993
                    match -> insertion:      0.003
                    match -> deletion:       0.004
                    deletion -> match:       0.406
                    deletion -> deletion:    0.594
                    insertion -> match:      0.297
                    insertion -> insertion:  0.703

                    Emission Probabilities:
                    match (equal):           0.950
                    match (not equal):       0.017
                    insertion:               1.000
                    deletion:                1.000

2024-07-08 16:05:33 Calling potential SNVs using pileup...
2024-07-08 16:05:33 547 potential variants identified.
2024-07-08 16:05:33 Generating haplotype fragments from reads...
2024-07-08 16:05:34    10% of variants processed...
2024-07-08 16:05:34    20% of variants processed...
2024-07-08 16:05:34    30% of variants processed...
2024-07-08 16:05:34    40% of variants processed...
2024-07-08 16:05:34    50% of variants processed...
2024-07-08 16:05:34    60% of variants processed...
2024-07-08 16:05:34    70% of variants processed...
2024-07-08 16:05:34    80% of variants processed...
2024-07-08 16:05:34    90% of variants processed...
2024-07-08 16:05:34    100% of variants processed.
2024-07-08 16:05:34 Calling initial genotypes using pair-HMM realignment...
2024-07-08 16:05:34 Iteratively assembling haplotypes and refining genotypes...
2024-07-08 16:05:34    Round 1 of haplotype assembly...
2024-07-08 16:05:34    (Before HapCUT2) Total phased heterozygous SNVs: 23  Total likelihood (phred): 757167.63
2024-07-08 16:05:34    (After HapCUT2)  Total phased heterozygous SNVs: 23  Total likelihood (phred): 36533.11
2024-07-08 16:05:34    (After Greedy)   Total phased heterozygous SNVs: 23  Total likelihood (phred): 35002.70
2024-07-08 16:05:34    Round 2 of haplotype assembly...
2024-07-08 16:05:34    (Before HapCUT2) Total phased heterozygous SNVs: 5  Total likelihood (phred): 35002.70
2024-07-08 16:05:34    (After HapCUT2)  Total phased heterozygous SNVs: 5  Total likelihood (phred): 35002.70
2024-07-08 16:05:34    (After Greedy)   Total phased heterozygous SNVs: 5  Total likelihood (phred): 35002.70
2024-07-08 16:05:34 Printing VCF file...
[M::mm_idx_gen::0.002*1.70] collected minimizers
[M::mm_idx_gen::0.003*2.97] sorted minimizers
[M::main::0.003*2.95] loaded/built the index for 1 target sequence(s)
[M::mm_mapopt_update::0.004*2.81] mid_occ = 10
[M::mm_idx_stat] kmer size: 19; skip: 10; is_hpc: 1; #seq: 1
[M::mm_idx_stat::0.004*2.75] distinct minimizers: 1692 (99.82% are singletons); average occurrences: 1.004; average spacing: 7.786; total length: 13229
[M::worker_pipeline::5.751*0.83] mapped 9477 sequences
[M::main] Version: 2.22-r1101
[M::main] CMD: minimap2 -t 5 -x map-pb -a /home/aiakupo/SpecHLA/script/../db//HLA/HLA_DRB1/HLA_DRB1.fa /home/aiakupo/SpecHLA/output/DNA01//DRB1.long_read.fq.gz
[M::main] Real time: 5.752 sec; CPU: 4.776 sec; Peak RSS: 0.245 GB
downsample ratio is 1 for DRB1
Mean depth {'HLA_DRB1': 45}

2024-07-08 16:05:45 Min read coverage set to 2.
2024-07-08 16:05:45 Max read coverage set to 100000.
2024-07-08 16:05:45 Estimating alignment parameters...
2024-07-08 16:05:45 Done estimating alignment parameters.

                    Transition Probabilities:
                    match -> match:          0.995
                    match -> insertion:      0.003
                    match -> deletion:       0.002
                    deletion -> match:       0.625
                    deletion -> deletion:    0.375
                    insertion -> match:      0.577
                    insertion -> insertion:  0.423

                    Emission Probabilities:
                    match (equal):           0.972
                    match (not equal):       0.009
                    insertion:               1.000
                    deletion:                1.000

2024-07-08 16:05:45 Calling potential SNVs using pileup...
2024-07-08 16:05:45 592 potential variants identified.
2024-07-08 16:05:45 Generating haplotype fragments from reads...
2024-07-08 16:05:46    10% of variants processed...
2024-07-08 16:05:46    20% of variants processed...
2024-07-08 16:05:46    30% of variants processed...
2024-07-08 16:05:46    40% of variants processed...
2024-07-08 16:05:46    50% of variants processed...
2024-07-08 16:05:46    60% of variants processed...
2024-07-08 16:05:46    70% of variants processed...
2024-07-08 16:05:46    80% of variants processed...
2024-07-08 16:05:46    90% of variants processed...
2024-07-08 16:05:46    100% of variants processed.
2024-07-08 16:05:46 Calling initial genotypes using pair-HMM realignment...
2024-07-08 16:05:46 Iteratively assembling haplotypes and refining genotypes...
2024-07-08 16:05:46    Round 1 of haplotype assembly...
2024-07-08 16:05:46    (Before HapCUT2) Total phased heterozygous SNVs: 216  Total likelihood (phred): 224750.42
2024-07-08 16:05:48    (After HapCUT2)  Total phased heterozygous SNVs: 216  Total likelihood (phred): 61012.72
2024-07-08 16:05:48    (After Greedy)   Total phased heterozygous SNVs: 216  Total likelihood (phred): 44938.21
2024-07-08 16:05:48    Round 2 of haplotype assembly...
2024-07-08 16:05:48    (Before HapCUT2) Total phased heterozygous SNVs: 327  Total likelihood (phred): 44938.21
2024-07-08 16:05:51    (After HapCUT2)  Total phased heterozygous SNVs: 327  Total likelihood (phred): 47655.22
2024-07-08 16:05:51    (After Greedy)   Total phased heterozygous SNVs: 327  Total likelihood (phred): 44208.28
2024-07-08 16:05:51 Printing VCF file...
Sequence is reconstructed, start annotation...
parameter:      sample:DNA01    dir:/home/aiakupo/SpecHLA/output/DNA01/ pop:Unknown     wxs:tgs G_nom:0
No such file or directory
# version: IPD-IMGT/HLA 3.38.0
Sample  HLA_A_1 HLA_A_2 HLA_B_1 HLA_B_2 HLA_C_1 HLA_C_2 HLA_DPA1_1      HLA_DPA1_2      HLA_DPB1_1      HLA_DPB1_2      HLA_DQA1_1      HLA_DQA1_2      HLA_DQB1_1      HLA_DQB1_2      HLA_DRB1_1     HLA_DRB1_2
optimize typing results by balancing alignment length and identity
The refined typing results is in /home/aiakupo/SpecHLA/output/DNA01//hlala.like.results.txt
Finished.
wshuai294 commented 3 months ago

Hi Alisa,

The log shows the HLA II genes have high depth. The database likely causes the issue. Does the folder '~/SpecHLA/db/' exist? If not, please download it. If it exists, updating to the latest version of SpecHLA might solve the problem.

Please let me know if these work for you.

On Tue, Jul 9, 2024 at 4:08 AM Alisa @.***> wrote:

Hi Shuai,

both vcf and hla.allele..HLA_.fasta contain information, so these steps seemed to work.. I understand why it doesn't work for HLA I: we did enrichment for a certain region of MHC II region only. Well, I thought alleles could be determined there at least but still the final output file is empty somehow :(

(base) @.:~/SpecHLA/output/DNA01$ cat hlala.like.results.txt Locus Chromosome Allele A 1 A 2 B 1 B 2 C 1 C 2 DPA1 1 DPA1 2 DPB1 1 DPB1 2 DQA1 1 DQA1 2 DQB1 1 DQB1 2 DRB1 1 DRB1 2 (base) @.:~/SpecHLA/output/DNA01$ cat hla.result. hla.result.details.txt hla.result.txt (base) @.***:~/SpecHLA/output/DNA01$ cat hla.result.txt

version: IPD-IMGT/HLA 3.38.0

Sample HLA_A_1 HLA_A_2 HLA_B_1 HLA_B_2 HLA_C_1 HLA_C_2 HLA_DPA1_1 HLA_DPA1_2 HLA_DPB1_1 HLA_DPB1_2 HLA_DQA1_1 HLA_DQA1_2 HLA_DQB1_1 HLA_DQB1_2 HLA_DRB1_1 HLA_DRB1_2 (base) @.***:~/SpecHLA/output/DNA01$ cat hla.result.details.txt

version: IPD-IMGT/HLA 3.38.0

Gene G_best allele details:allele;Score;Caucasian;Black;Asian

@.**:~/SpecHLA/script$ python3 long_read_typing.py -r /fast/AG_Lee/Alisa/HLA/0000004436/outputs/fastx_files/m64316_240319_133624.bc2073--bc2073.hifi_reads.fastq.gz -n DNA01 -o /home/aiakupo/SpecHLA/output [M::mm_idx_gen::0.4611.00] collected minimizers [M::mm_idx_gen::0.5161.41] sorted minimizers [M::main::0.5171.41] loaded/built the index for 1591 target sequence(s) [M::mm_mapopt_update::0.5191.41] mid_occ = 936 [M::mm_idx_stat] kmer size: 19; skip: 10; is_hpc: 1; #seq: 1591 [M::mm_idx_stat::0.5201.41] distinct minimizers: 49986 (20.90% are singletons); average occurrences: 39.684; average spacing: 7.725; total length: 15322978 [M::worker_pipeline::2785.9204.99] mapped 87302 sequences [M::worker_pipeline::5446.6145.00] mapped 90496 sequences [M::worker_pipeline::6477.3794.98] mapped 34247 sequences [M::main] Version: 2.22-r1101 [M::main] CMD: minimap2 -t 5 -x map-pb -p 0.1 -N 100000 -a /home/aiakupo/SpecHLA/script/../db//ref/hla_gen.format.filter.extend.DRB.no26789.fasta /fast/AG_Lee/Alisa/HLA/0000004436/outputs/fastx_files/m64316_240319_133624.bc2073--bc2073.hifi_reads.fastq.gz [M::main] Real time: 6477.463 sec; CPU: 32256.322 sec; Peak RSS: 4.391 GB alignment done. reads-binning done. [M::mm_idx_gen::0.0022.07] collected minimizers [M::mm_idx_gen::0.0033.10] sorted minimizers [M::main::0.0033.08] loaded/built the index for 1 target sequence(s) [M::mm_mapopt_update::0.0032.96] mid_occ = 10 [M::mm_idx_stat] kmer size: 19; skip: 10; is_hpc: 1; #seq: 1 [M::mm_idx_stat::0.0032.91] distinct minimizers: 706 (100.00% are singletons); average occurrences: 1.000; average spacing: 7.795; total length: 5503 [M::main] Version: 2.22-r1101 [M::main] CMD: minimap2 -t 5 -x map-pb -a /home/aiakupo/SpecHLA/script/../db//HLA/HLA_A/HLA_A.fa /home/aiakupo/SpecHLA/output/DNA01//A.long_read.fq.gz [M::main] Real time: 0.004 sec; CPU: 0.010 sec; Peak RSS: 0.003 GB downsample ratio is 1 for A Mean depth {'HLA_A': 0}

2024-07-08 16:05:00 Min read coverage set to 2. 2024-07-08 16:05:00 Max read coverage set to 100000. 2024-07-08 16:05:00 Estimating alignment parameters... 2024-07-08 16:05:00 Done estimating alignment parameters.

                Transition Probabilities:
                match -> match:          0.333
                match -> insertion:      0.333
                match -> deletion:       0.333
                deletion -> match:       0.500
                deletion -> deletion:    0.500
                insertion -> match:      0.500
                insertion -> insertion:  0.500

                Emission Probabilities:
                match (equal):           0.500
                match (not equal):       0.167
                insertion:               1.000
                deletion:                1.000

2024-07-08 16:05:00 Calling potential SNVs using pileup... 2024-07-08 16:05:00 0 potential variants identified. No candidate variants identified, printing empty VCF file... [M::mm_idx_gen::0.0021.89] collected minimizers [M::mm_idx_gen::0.0033.13] sorted minimizers [M::main::0.0033.11] loaded/built the index for 1 target sequence(s) [M::mm_mapopt_update::0.0033.00] mid_occ = 10 [M::mm_idx_stat] kmer size: 19; skip: 10; is_hpc: 1; #seq: 1 [M::mm_idx_stat::0.003*2.95] distinct minimizers: 788 (100.00% are singletons); average occurrences: 1.000; average spacing: 7.717; total length: 6081 [M::main] Version: 2.22-r1101 [M::main] CMD: minimap2 -t 5 -x map-pb -a /home/aiakupo/SpecHLA/script/../db//HLA/HLA_B/HLA_B.fa /home/aiakupo/SpecHLA/output/DNA01//B.long_read.fq.gz [M::main] Real time: 0.004 sec; CPU: 0.010 sec; Peak RSS: 0.003 GB downsample ratio is 1 for B Mean depth {'HLA_B': 0}

2024-07-08 16:05:00 Min read coverage set to 2. 2024-07-08 16:05:00 Max read coverage set to 100000. 2024-07-08 16:05:00 Estimating alignment parameters... 2024-07-08 16:05:00 Done estimating alignment parameters.

                Transition Probabilities:
                match -> match:          0.333
                match -> insertion:      0.333
                match -> deletion:       0.333
                deletion -> match:       0.500
                deletion -> deletion:    0.500
                insertion -> match:      0.500
                insertion -> insertion:  0.500

                Emission Probabilities:
                match (equal):           0.500
                match (not equal):       0.167
                insertion:               1.000
                deletion:                1.000

2024-07-08 16:05:00 Calling potential SNVs using pileup... 2024-07-08 16:05:00 0 potential variants identified. No candidate variants identified, printing empty VCF file... [M::mm_idx_gen::0.0021.77] collected minimizers [M::mm_idx_gen::0.0032.86] sorted minimizers [M::main::0.0032.84] loaded/built the index for 1 target sequence(s) [M::mm_mapopt_update::0.0032.71] mid_occ = 10 [M::mm_idx_stat] kmer size: 19; skip: 10; is_hpc: 1; #seq: 1 [M::mm_idx_stat::0.003*2.67] distinct minimizers: 826 (100.00% are singletons); average occurrences: 1.000; average spacing: 7.632; total length: 6304 [M::main] Version: 2.22-r1101 [M::main] CMD: minimap2 -t 5 -x map-pb -a /home/aiakupo/SpecHLA/script/../db//HLA/HLA_C/HLA_C.fa /home/aiakupo/SpecHLA/output/DNA01//C.long_read.fq.gz [M::main] Real time: 0.004 sec; CPU: 0.009 sec; Peak RSS: 0.003 GB downsample ratio is 1 for C Mean depth {'HLA_C': 0}

2024-07-08 16:05:00 Min read coverage set to 2. 2024-07-08 16:05:00 Max read coverage set to 100000. 2024-07-08 16:05:00 Estimating alignment parameters... 2024-07-08 16:05:00 Done estimating alignment parameters.

                Transition Probabilities:
                match -> match:          0.333
                match -> insertion:      0.333
                match -> deletion:       0.333
                deletion -> match:       0.500
                deletion -> deletion:    0.500
                insertion -> match:      0.500
                insertion -> insertion:  0.500

                Emission Probabilities:
                match (equal):           0.500
                match (not equal):       0.167
                insertion:               1.000
                deletion:                1.000

2024-07-08 16:05:00 Calling potential SNVs using pileup... 2024-07-08 16:05:00 0 potential variants identified. No candidate variants identified, printing empty VCF file... [M::mm_idx_gen::0.0021.71] collected minimizers [M::mm_idx_gen::0.0032.73] sorted minimizers [M::main::0.0032.71] loaded/built the index for 1 target sequence(s) [M::mm_mapopt_update::0.0032.61] mid_occ = 10 [M::mm_idx_stat] kmer size: 19; skip: 10; is_hpc: 1; #seq: 1 [M::mm_idx_stat::0.0042.55] distinct minimizers: 1488 (100.00% are singletons); average occurrences: 1.000; average spacing: 7.913; total length: 11775 [M::worker_pipeline::0.2702.62] mapped 231 sequences [M::main] Version: 2.22-r1101 [M::main] CMD: minimap2 -t 5 -x map-pb -a /home/aiakupo/SpecHLA/script/../db//HLA/HLA_DPA1/HLA_DPA1.fa /home/aiakupo/SpecHLA/output/DNA01//DPA1.long_read.fq.gz [M::main] Real time: 0.271 sec; CPU: 0.707 sec; Peak RSS: 0.071 GB downsample ratio is 1 for DPA1 Mean depth {'HLA_DPA1': 0}

2024-07-08 16:05:01 Min read coverage set to 2. 2024-07-08 16:05:01 Max read coverage set to 100000. 2024-07-08 16:05:01 Estimating alignment parameters... 2024-07-08 16:05:01 Done estimating alignment parameters.

                Transition Probabilities:
                match -> match:          0.333
                match -> insertion:      0.333
                match -> deletion:       0.333
                deletion -> match:       0.500
                deletion -> deletion:    0.500
                insertion -> match:      0.500
                insertion -> insertion:  0.500

                Emission Probabilities:
                match (equal):           0.500
                match (not equal):       0.167
                insertion:               1.000
                deletion:                1.000

2024-07-08 16:05:01 Calling potential SNVs using pileup... 2024-07-08 16:05:01 0 potential variants identified. No candidate variants identified, printing empty VCF file... [M::mm_idx_gen::0.0021.64] collected minimizers [M::mm_idx_gen::0.0032.93] sorted minimizers [M::main::0.0032.91] loaded/built the index for 1 target sequence(s) [M::mm_mapopt_update::0.0042.79] mid_occ = 10 [M::mm_idx_stat] kmer size: 19; skip: 10; is_hpc: 1; #seq: 1 [M::mm_idx_stat::0.0042.73] distinct minimizers: 1730 (99.94% are singletons); average occurrences: 1.001; average spacing: 7.780; total length: 13468 [M::worker_pipeline::7.2302.84] mapped 5835 sequences [M::main] Version: 2.22-r1101 [M::main] CMD: minimap2 -t 5 -x map-pb -a /home/aiakupo/SpecHLA/script/../db//HLA/HLA_DPB1/HLA_DPB1.fa /home/aiakupo/SpecHLA/output/DNA01//DPB1.long_read.fq.gz [M::main] Real time: 7.231 sec; CPU: 20.564 sec; Peak RSS: 0.147 GB downsample ratio is 1 for DPB1 Mean depth {'HLA_DPB1': 182}

2024-07-08 16:05:12 Min read coverage set to 2. 2024-07-08 16:05:12 Max read coverage set to 100000. 2024-07-08 16:05:12 Estimating alignment parameters... 2024-07-08 16:05:12 Done estimating alignment parameters.

                Transition Probabilities:
                match -> match:          0.990
                match -> insertion:      0.006
                match -> deletion:       0.004
                deletion -> match:       0.672
                deletion -> deletion:    0.328
                insertion -> match:      0.763
                insertion -> insertion:  0.237

                Emission Probabilities:
                match (equal):           0.925
                match (not equal):       0.025
                insertion:               1.000
                deletion:                1.000

2024-07-08 16:05:12 Calling potential SNVs using pileup... 2024-07-08 16:05:12 123 potential variants identified. 2024-07-08 16:05:12 Generating haplotype fragments from reads... 2024-07-08 16:05:17 10% of variants processed... 2024-07-08 16:05:17 20% of variants processed... 2024-07-08 16:05:18 30% of variants processed... 2024-07-08 16:05:18 40% of variants processed... 2024-07-08 16:05:18 100% of variants processed. 2024-07-08 16:05:18 Calling initial genotypes using pair-HMM realignment... 2024-07-08 16:05:18 Iteratively assembling haplotypes and refining genotypes... 2024-07-08 16:05:18 Round 1 of haplotype assembly... 2024-07-08 16:05:18 (Before HapCUT2) Total phased heterozygous SNVs: 61 Total likelihood (phred): 1012374.57 2024-07-08 16:05:20 (After HapCUT2) Total phased heterozygous SNVs: 61 Total likelihood (phred): 322425.24 2024-07-08 16:05:20 (After Greedy) Total phased heterozygous SNVs: 61 Total likelihood (phred): 291216.51 2024-07-08 16:05:20 Round 2 of haplotype assembly... 2024-07-08 16:05:20 (Before HapCUT2) Total phased heterozygous SNVs: 46 Total likelihood (phred): 291216.51 2024-07-08 16:05:21 (After HapCUT2) Total phased heterozygous SNVs: 46 Total likelihood (phred): 291900.35 2024-07-08 16:05:21 (After Greedy) Total phased heterozygous SNVs: 46 Total likelihood (phred): 291216.51 2024-07-08 16:05:21 Printing VCF file... [M::mm_idx_gen::0.0021.75] collected minimizers [M::mm_idx_gen::0.0033.13] sorted minimizers [M::main::0.0033.12] loaded/built the index for 1 target sequence(s) [M::mm_mapopt_update::0.0042.99] mid_occ = 10 [M::mm_idx_stat] kmer size: 19; skip: 10; is_hpc: 1; #seq: 1 [M::mm_idx_stat::0.0042.94] distinct minimizers: 1059 (99.91% are singletons); average occurrences: 1.008; average spacing: 7.959; total length: 8492 [M::worker_pipeline::6.3873.58] mapped 3343 sequences [M::main] Version: 2.22-r1101 [M::main] CMD: minimap2 -t 5 -x map-pb -a /home/aiakupo/SpecHLA/script/../db//HLA/HLA_DQA1/HLA_DQA1.fa /home/aiakupo/SpecHLA/output/DNA01//DQA1.long_read.fq.gz [M::main] Real time: 6.388 sec; CPU: 22.850 sec; Peak RSS: 0.153 GB downsample ratio is 1 for DQA1 Mean depth {'HLA_DQA1': 88}

2024-07-08 16:05:30 Min read coverage set to 2. 2024-07-08 16:05:30 Max read coverage set to 100000. 2024-07-08 16:05:30 Estimating alignment parameters... 2024-07-08 16:05:30 Done estimating alignment parameters.

                Transition Probabilities:
                match -> match:          0.991
                match -> insertion:      0.005
                match -> deletion:       0.004
                deletion -> match:       0.514
                deletion -> deletion:    0.486
                insertion -> match:      0.556
                insertion -> insertion:  0.444

                Emission Probabilities:
                match (equal):           0.975
                match (not equal):       0.008
                insertion:               1.000
                deletion:                1.000

2024-07-08 16:05:30 Calling potential SNVs using pileup... 2024-07-08 16:05:30 183 potential variants identified. 2024-07-08 16:05:30 Generating haplotype fragments from reads... 2024-07-08 16:05:30 10% of variants processed... 2024-07-08 16:05:30 20% of variants processed... 2024-07-08 16:05:30 30% of variants processed... 2024-07-08 16:05:30 40% of variants processed... 2024-07-08 16:05:30 50% of variants processed... 2024-07-08 16:05:30 60% of variants processed... 2024-07-08 16:05:30 70% of variants processed... 2024-07-08 16:05:31 80% of variants processed... 2024-07-08 16:05:31 90% of variants processed... 2024-07-08 16:05:31 100% of variants processed. 2024-07-08 16:05:31 Calling initial genotypes using pair-HMM realignment... 2024-07-08 16:05:31 Iteratively assembling haplotypes and refining genotypes... 2024-07-08 16:05:31 Round 1 of haplotype assembly... 2024-07-08 16:05:31 (Before HapCUT2) Total phased heterozygous SNVs: 123 Total likelihood (phred): 92420.53 2024-07-08 16:05:31 (After HapCUT2) Total phased heterozygous SNVs: 123 Total likelihood (phred): 38571.91 2024-07-08 16:05:31 (After Greedy) Total phased heterozygous SNVs: 123 Total likelihood (phred): 34372.78 2024-07-08 16:05:31 Round 2 of haplotype assembly... 2024-07-08 16:05:31 (Before HapCUT2) Total phased heterozygous SNVs: 99 Total likelihood (phred): 34372.78 2024-07-08 16:05:32 (After HapCUT2) Total phased heterozygous SNVs: 99 Total likelihood (phred): 34372.78 2024-07-08 16:05:32 (After Greedy) Total phased heterozygous SNVs: 99 Total likelihood (phred): 34372.78 2024-07-08 16:05:32 Printing VCF file... [M::mm_idx_gen::0.0021.79] collected minimizers [M::mm_idx_gen::0.0033.00] sorted minimizers [M::main::0.0032.99] loaded/built the index for 1 target sequence(s) [M::mm_mapopt_update::0.0032.84] mid_occ = 10 [M::mm_idx_stat] kmer size: 19; skip: 10; is_hpc: 1; #seq: 1 [M::mm_idx_stat::0.0032.78] distinct minimizers: 1216 (99.84% are singletons); average occurrences: 1.004; average spacing: 7.764; total length: 9480 [M::worker_pipeline::0.8882.90] mapped 668 sequences [M::main] Version: 2.22-r1101 [M::main] CMD: minimap2 -t 5 -x map-pb -a /home/aiakupo/SpecHLA/script/../db//HLA/HLA_DQB1/HLA_DQB1.fa /home/aiakupo/SpecHLA/output/DNA01//DQB1.long_read.fq.gz [M::main] Real time: 0.889 sec; CPU: 2.572 sec; Peak RSS: 0.096 GB downsample ratio is 1 for DQB1 Mean depth {'HLA_DQB1': 95}

2024-07-08 16:05:33 Min read coverage set to 2. 2024-07-08 16:05:33 Max read coverage set to 100000. 2024-07-08 16:05:33 Estimating alignment parameters... 2024-07-08 16:05:33 Done estimating alignment parameters.

                Transition Probabilities:
                match -> match:          0.993
                match -> insertion:      0.003
                match -> deletion:       0.004
                deletion -> match:       0.406
                deletion -> deletion:    0.594
                insertion -> match:      0.297
                insertion -> insertion:  0.703

                Emission Probabilities:
                match (equal):           0.950
                match (not equal):       0.017
                insertion:               1.000
                deletion:                1.000

2024-07-08 16:05:33 Calling potential SNVs using pileup... 2024-07-08 16:05:33 547 potential variants identified. 2024-07-08 16:05:33 Generating haplotype fragments from reads... 2024-07-08 16:05:34 10% of variants processed... 2024-07-08 16:05:34 20% of variants processed... 2024-07-08 16:05:34 30% of variants processed... 2024-07-08 16:05:34 40% of variants processed... 2024-07-08 16:05:34 50% of variants processed... 2024-07-08 16:05:34 60% of variants processed... 2024-07-08 16:05:34 70% of variants processed... 2024-07-08 16:05:34 80% of variants processed... 2024-07-08 16:05:34 90% of variants processed... 2024-07-08 16:05:34 100% of variants processed. 2024-07-08 16:05:34 Calling initial genotypes using pair-HMM realignment... 2024-07-08 16:05:34 Iteratively assembling haplotypes and refining genotypes... 2024-07-08 16:05:34 Round 1 of haplotype assembly... 2024-07-08 16:05:34 (Before HapCUT2) Total phased heterozygous SNVs: 23 Total likelihood (phred): 757167.63 2024-07-08 16:05:34 (After HapCUT2) Total phased heterozygous SNVs: 23 Total likelihood (phred): 36533.11 2024-07-08 16:05:34 (After Greedy) Total phased heterozygous SNVs: 23 Total likelihood (phred): 35002.70 2024-07-08 16:05:34 Round 2 of haplotype assembly... 2024-07-08 16:05:34 (Before HapCUT2) Total phased heterozygous SNVs: 5 Total likelihood (phred): 35002.70 2024-07-08 16:05:34 (After HapCUT2) Total phased heterozygous SNVs: 5 Total likelihood (phred): 35002.70 2024-07-08 16:05:34 (After Greedy) Total phased heterozygous SNVs: 5 Total likelihood (phred): 35002.70 2024-07-08 16:05:34 Printing VCF file... [M::mm_idx_gen::0.0021.70] collected minimizers [M::mm_idx_gen::0.0032.97] sorted minimizers [M::main::0.0032.95] loaded/built the index for 1 target sequence(s) [M::mm_mapopt_update::0.0042.81] mid_occ = 10 [M::mm_idx_stat] kmer size: 19; skip: 10; is_hpc: 1; #seq: 1 [M::mm_idx_stat::0.0042.75] distinct minimizers: 1692 (99.82% are singletons); average occurrences: 1.004; average spacing: 7.786; total length: 13229 [M::worker_pipeline::5.7510.83] mapped 9477 sequences [M::main] Version: 2.22-r1101 [M::main] CMD: minimap2 -t 5 -x map-pb -a /home/aiakupo/SpecHLA/script/../db//HLA/HLA_DRB1/HLA_DRB1.fa /home/aiakupo/SpecHLA/output/DNA01//DRB1.long_read.fq.gz [M::main] Real time: 5.752 sec; CPU: 4.776 sec; Peak RSS: 0.245 GB downsample ratio is 1 for DRB1 Mean depth {'HLA_DRB1': 45}

2024-07-08 16:05:45 Min read coverage set to 2. 2024-07-08 16:05:45 Max read coverage set to 100000. 2024-07-08 16:05:45 Estimating alignment parameters... 2024-07-08 16:05:45 Done estimating alignment parameters.

                Transition Probabilities:
                match -> match:          0.995
                match -> insertion:      0.003
                match -> deletion:       0.002
                deletion -> match:       0.625
                deletion -> deletion:    0.375
                insertion -> match:      0.577
                insertion -> insertion:  0.423

                Emission Probabilities:
                match (equal):           0.972
                match (not equal):       0.009
                insertion:               1.000
                deletion:                1.000

2024-07-08 16:05:45 Calling potential SNVs using pileup... 2024-07-08 16:05:45 592 potential variants identified. 2024-07-08 16:05:45 Generating haplotype fragments from reads... 2024-07-08 16:05:46 10% of variants processed... 2024-07-08 16:05:46 20% of variants processed... 2024-07-08 16:05:46 30% of variants processed... 2024-07-08 16:05:46 40% of variants processed... 2024-07-08 16:05:46 50% of variants processed... 2024-07-08 16:05:46 60% of variants processed... 2024-07-08 16:05:46 70% of variants processed... 2024-07-08 16:05:46 80% of variants processed... 2024-07-08 16:05:46 90% of variants processed... 2024-07-08 16:05:46 100% of variants processed. 2024-07-08 16:05:46 Calling initial genotypes using pair-HMM realignment... 2024-07-08 16:05:46 Iteratively assembling haplotypes and refining genotypes... 2024-07-08 16:05:46 Round 1 of haplotype assembly... 2024-07-08 16:05:46 (Before HapCUT2) Total phased heterozygous SNVs: 216 Total likelihood (phred): 224750.42 2024-07-08 16:05:48 (After HapCUT2) Total phased heterozygous SNVs: 216 Total likelihood (phred): 61012.72 2024-07-08 16:05:48 (After Greedy) Total phased heterozygous SNVs: 216 Total likelihood (phred): 44938.21 2024-07-08 16:05:48 Round 2 of haplotype assembly... 2024-07-08 16:05:48 (Before HapCUT2) Total phased heterozygous SNVs: 327 Total likelihood (phred): 44938.21 2024-07-08 16:05:51 (After HapCUT2) Total phased heterozygous SNVs: 327 Total likelihood (phred): 47655.22 2024-07-08 16:05:51 (After Greedy) Total phased heterozygous SNVs: 327 Total likelihood (phred): 44208.28 2024-07-08 16:05:51 Printing VCF file... Sequence is reconstructed, start annotation... parameter: sample:DNA01 dir:/home/aiakupo/SpecHLA/output/DNA01/ pop:Unknown wxs:tgs G_nom:0 No such file or directory

version: IPD-IMGT/HLA 3.38.0

Sample HLA_A_1 HLA_A_2 HLA_B_1 HLA_B_2 HLA_C_1 HLA_C_2 HLA_DPA1_1 HLA_DPA1_2 HLA_DPB1_1 HLA_DPB1_2 HLA_DQA1_1 HLA_DQA1_2 HLA_DQB1_1 HLA_DQB1_2 HLA_DRB1_1 HLA_DRB1_2 optimize typing results by balancing alignment length and identity The refined typing results is in /home/aiakupo/SpecHLA/output/DNA01//hlala.like.results.txt Finished.

— Reply to this email directly, view it on GitHub https://github.com/deepomicslab/SpecHLA/issues/29#issuecomment-2215147221, or unsubscribe https://github.com/notifications/unsubscribe-auth/ALS7DWX5TWCLKFWXTJFR57LZLLWSXAVCNFSM6AAAAABHQGSAT6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMJVGE2DOMRSGE . You are receiving this because you commented.Message ID: @.***>

alisamatisse commented 3 months ago

Hi Shuai,

Thanks for your patience, it worked! The problem was installing dependencies... maybe you could somehow specify what has to be installed exactly just for the "long_read_typing.py" script. At some point I had an error I did not have "bgzip" and "tabix"... it was not obvious to me as I specifically installed conda for this analysis for the first time. Sorry! Now if I have other questions, I will open a new issue :) grateful for your support!!

wshuai294 commented 3 months ago

So nice to hear it works. Also, it is a good idea to use a specific env for long_read_typing.py. I will add this. Thanks for the advice.