DaehwanKimLab / hisat-genotype

GNU General Public License v3.0
23 stars 15 forks source link

HLA typing on Nanopore or PacBio data #55

Open kokyriakidis opened 3 years ago

kokyriakidis commented 3 years ago

Hi!

Is there a chance it will work with long reads like Nanopore or Pacbio?

chbe-helix commented 3 years ago

Hi Kokyriakidis,

Great question! It might. We'd have to work with you to figure out a custom pipeline to get it to work. My largest concern would be with the error rate of common long read technologies. Here is a rough idea of some of the changes and customizations we'd need to consider: 1) get HISAT-genotype's custom genotype genome built with a long read aligner 2) Possible error correction in the long reads. 3) Feed the alignments into HISAT-gentoype (this is already possible)

It would be a little work to get it done but it may be possible. Let me know your thoughts.

Thanks, Chris

kokyriakidis commented 3 years ago

Hi Chris,

The current best practice is to allign Nanopore reads with minimap2

minimap2 -a -z 600,200 -ax map-ont --MD -t {threads}  \
            -R "@RG\\tID:{sample}\\tSM:{sample}"  \
            {reference} {query} | \
samtools sort -@ {threads} -o {output} -

and PacBio reads with pbmm2

pbmm2 align --num-threads {threads} \
            --preset CCS \
            --rg "@RG\\tID:{sample}\\tSM:{sample}" \
            --log-level INFO \
            {extra} \
            {reference} \
            {query} \
            {bam})

PacBio HiFi reads do not need any error correction. Most recent PacBio data are HiFi nowadays.

Nanopore data may require error correction but I am not sure if this is gonna mess with the alleles.

If I provide a phased haplotagged BAM file will HISAT-genotype be able to HLA type using the phasing information from the haplotagged BAM?

(My end goal is to create a Pharmacogenomics workflow that can handle Illumina, Nanopore and PacBio data. I wanted to incorporate HISAT-genotype in this workflow as a tool that can handle all types of data)

Thanks, Konstantinos

chbe-helix commented 3 years ago

Hi Konstantinos,

If the BAM files are generated with coordinates that match the genotype genome reference, it could work. HISAT-genotype uses a custom Genotype Genome reference that has shifted coordinates to GRCh38. So, if you can generate haplotagged BAM files with genotype genome coordinates that would be ideal. We're working towards a GRCh38 to genotype genome (and vice versa) mapping to integrate with HISAT-genotype. That will likely happen in a later release though and may be outside of when you're looking to develop your workflow.

If you provide a phased haplotagged BAM file with genotype genome coordinates, I'd be happy to see if we can get HISAT-genotype working for your purposes.

Thanks, Chris

kokyriakidis commented 3 years ago

Hmm I need GRCh38 as input for later stages like variant calling. I will try to map with minimap2 using this custom genotype genome reference and make an evaluation of the variants produced compared to using GRCh38.

I will try to get you a phased haplotagged BAM with genotype genome coordinates in order to see how useful the information that carries is for HISAT-genotype.

Working directly with Nanopore or PacBio fastq reads needs a custom model from your end if I understand correctly.

So, I will keep this isssue open for research purposes.

Thanks, Konstantinos

chbe-helix commented 3 years ago

Hi Konstantinos,

Sounds good! I understand needing to use GRCh38 coordinates and, yes, I will likely need to make modifications to HISAT-genotype and it's models to get it working with long reads. I'm happy to work with you on this endeavor. Let me know if you can get useful genotype genome coordinate BAM files for me to edit the models and I will let you know when I get the GRCh38 to genotype genome map working.

Thanks, Chris

adbeggs commented 1 year ago

HI both, Any progress on this? I am happy to help @chbe-helix as I also have an interest in this area and am working with Oxford Nanopore on long read HLA typing.

Best wishes

Andrew

alisamatisse commented 4 weeks ago

I would be also happy to learn more on how to run HISAT on long-read sequencing (PacBio HiFi) data...