full-length or exon typing

deepomicslab / SpecHLA

SpecHLA reconstructs entire diploid sequences of HLA genes and infers LOH events. It supports HLA-A, -B, -C, -DPA1, -DPB1, -DQA1, -DQB1, and -DRB1 genes. Also, it supports both short- and long-read data.

MIT License

35 stars 9 forks source link

full-length or exon typing #16

Closed bsb2014 closed 11 months ago

bsb2014 commented 11 months ago

SpecHLA publication suggests that full-length typing outperforms exon typing. I am wondering if the reconstructed gene sequences/full-length (-u 0) are better than the reconstructed exon sequences (-u 1). Do the reads from noncoding regions (introns) improve phasing? Thanks

bsb2014 commented 11 months ago

Do I need to care about the message below that popped up during the full-length typing (-u 0)? Thanks

Use of uninitialized value $hash{"HLA_DRB1_1"} in split at /home/src/SpecHLA/script/whole/annoHLA.pl line 318. Use of uninitialized value $hash{"HLA_DRB1_2"} in split at /home/src/SpecHLA/script/whole/annoHLA.pl line 318.

wshuai294 commented 11 months ago

Hi, the reads from noncoding regions (introns) can provide the linkage information between exons, thereby improving typing performance. And don't worry about the warning message, it has no impact.

bsb2014 commented 11 months ago

The warning message "Use of uninitialized value $hash{"HLA_DRB1_1"} in split at /home/src/SpecHLA/script/whole/annoHLA.pl line 318. Use of uninitialized value $hash{"HLA_DRB1_2"} in split at /home/src/SpecHLA/script/whole/annoHLA.pl line 318." often occurred with failure of DRB1 typing. Could you please let me know what the message means? Thanks

bsb2014 commented 11 months ago

Could you also explain what do ‘‘Bowtie,’’ ‘‘Exon,’’ ‘‘Whole.norealign,’’ ‘‘Whole,’’ and ‘‘Whole.SV’’ modes mean? Thanks

I found the answer, but it is not clear to me if Exon=Novoalign + exon? (It would be better if some aligner could replace Novoalign that is not free)

bsb2014 commented 11 months ago

If read binning with Bowtie2 + exon typing +15-20x read coverage + 150bp, how much accuracy for 2-field HLA typing? Thanks

wshuai294 commented 11 months ago

Hi,

The warning is caused by the strict requirement of Perl, we have removed the warning in the latest commit.
The default parameters are Novoalign + whole + realign + no SV. So, the mode name means its difference with the default parameters. E.g., exon means Novoalign + exon + realign + no SV. realign indicates using the database to link the unphased blocks.
We have not performed Bowtie2 + exon typing. But the accuracy of Bowtie2 + whole + 20x typing is roughly 0.8 in simulated data.

wshuai294 commented 11 months ago

I have not tested novoalign3, but i think it could work, maybe need some minor alterations in parameter settings.发自我的 iPhone在 2023年11月13日，22:43，bsb2014 @.***> 写道： Do you happen to know if Novoalign 3 works with SpecHLA? Thanks

—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you commented.Message ID: @.***>

bsb2014 commented 11 months ago

Many thanks for your helpful replies. I tested the SpecHLA with Novoalign 4. The Novoalign seems to treat Illumina reads as Sanger (see below). Is it normal? Thanks.

"# Interpreting input files as Sanger FASTQ."

wshuai294 commented 11 months ago

Don't worry. It's normal.

On Tue, Nov 14, 2023 at 10:09 AM bsb2014 @.***> wrote:

Many thanks for your helpful replies. I tested the SpecHLA with Novoalign

The Novoalign seems to treat Illumina reads as Sanger (see below). Is it normal? Thanks.

"# Interpreting input files as Sanger FASTQ."

— Reply to this email directly, view it on GitHub https://github.com/deepomicslab/SpecHLA/issues/16#issuecomment-1809428408, or unsubscribe https://github.com/notifications/unsubscribe-auth/ALS7DWWTVW23JQRQWH73HITYELHFTAVCNFSM6AAAAAA7ETHWNKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQMBZGQZDQNBQHA . You are receiving this because you commented.Message ID: @.***>