Closed srx3001 closed 6 years ago
Thank you for your interest in HLAminer. You may have to lower the score (and confidence) threshold for displaying HLA alleles/results. Personally, I have never been fond of the direct read alignment method, which is why I implemented the targeted assembly methodology. Minds you that when I did so, Illumina reads were very short (50 to 75 bp) and the direct alignment approach made less sense than it does now (now longer reads are able to capture phased variants).
I am not sure why bwa aln and mem are yielding different results out of the box, but in principle, you should be able to tune bwa mem (and minimap2 for that matter) to obtain similar results as bwa aln. Keep in mind that the latter two tools (mem and minimap) were originally designed to align longer sequences -- Tuning these tools to achieve a similar performance to bwa aln is really a question for Heng Li, however. fyi - We have never fully tested these tools with the HLAminer pipeline, only bwa aln.
Hello I have ran the scripts as noted in the test folder and got identical results with bwa aln. Then I tried to use bwa mem, instead of bwa aln. I used: bwa mem rd1.fq rd2.fq > aln_2.sam
then ran the rest of the script the same as for bwa aln. I have noticed the results vary a lot, for HLA-II, the summary is identical, but I am getting a lot less HLA-I alleles with bwa mem.
Ill paste the results (from the legend part) here:
Alignments to HLA
legend +:Coding variant seen by another contig previously RBH:Reciprocal Best Hit (if applicable)
Contig/Read pair,HLA [% HSP/% CONTIG/READ PAIR] A26:33_145_318_1:0:0_1:0:0_70size200|cov1.00,A26:33, [99.00 %H/99.00 %C]
A26:33_130_343_0:0:0_1:0:0_91size200|cov1.00,A26:33,+ [99.50 %H/99.50 %C]
A33:24_139_308_0:0:0_0:0:0_14size200|cov1.00,A68:20, [100.00 %H/100.00 %C]
B55:29_308_467_1:0:0_1:0:0_1fsize200|cov1.00,B55:29, [99.00 %H/99.00 %C]
A26:33_145_316_0:0:0_0:0:0_7asize200|cov1.00,A26:33,+ [100.00 %H/100.00 %C]
Gene,HLA Supertype/Group,Protein coding allele,Score,Validated by N Contigs,x=contig seen,Chance (%),Expect (Eval) value,Confidence Score (-10 log10(P)) Score = sum(Sz cov seq.id. 2 [RBH]) P HLA = (P tig1 is HLA P HLA is tig 1) (P tig2 is HLA P HLA is tig 2) .. (P tig(n) is HLA P HLA is tig n) where P tig x is HLA = sum(P HLA for tig x)
HLA-A A,A26,A26:33,597,3,,100.0000,1.22e-11,109.1 A,A68,A68:20,200,1,,99.9770,2.30e-04,36.4
HLA-B B,B55,B55:29,198,1,,99.9808,1.92e-04,37.2
HLA-C
HLA-E
HLA-F
HLA-G
HLA-DPA1
HLA-DPB1
HLA-DQA1
HLA-DQB1
HLA-DRA
HLA-DRB1
HLA-DRB2
HLA-DRB3
HLA-DRB4
HLA-DRB5
HLA-DRB6
HLA-DRB7
HLA-DRB8
HLA-DRB9
SUMMARY MOST LIKELY HLA-I ALLELES (Confidence (-10 log10(Eval)) >= 30 Score >= 500) Gene,Prediction rank,Group,Allele,Score,Expect (Eval) value,Confidence (-10 log10(Eval))
HLA-A Prediction #1 - A26 A,1,A26,A*26:33,597,1.22e-11,109.1
HLA-B
HLA-C
HLA-E
HLA-F
HLA-G
SUMMARY MOST LIKELY HLA-II ALLELES (Confidence (-10 log10(Eval)) >= 30 Score >= 500) Gene,Prediction rank,Group,Allele,Score,Expect (Eval) value,Confidence (-10 log10(Eval))
HLA-DPA1
HLA-DPB1
HLA-DQA1
HLA-DQB1
HLA-DRA
HLA-DRB1
HLA-DRB2
HLA-DRB3
HLA-DRB4
HLA-DRB5
HLA-DRB6
HLA-DRB7
HLA-DRB8
HLA-DRB9
Why is the difference this big when using bwa mem? Should we always stick to bwa aln when using this tool?