chhylp123 / hifiasm

Hifiasm: a haplotype-resolved assembler for accurate Hifi reads
MIT License
529 stars 86 forks source link

ERROR6 and others probably no warning #580

Open schellt opened 9 months ago

schellt commented 9 months ago

Dear all, after using hifiasm for a long time, we recently observed strange behavior for one assembly in particular.

When running hifiasm 0.19.8, the tool returns ERROR6 and other messages:

$ grep "ERROR" hifiasm-0.19.8.err
ERROR6
ERROR6
ERROR6
ERROR6
ERROR-r-break
ERROR-read

Of cause we saw the other issues, where you state not to worry about it but we are concerned that there is indeed some mistakes happening - at least in our case.

We are running hifiasm 0.19.8 with HiC data and HiFi reads fom two PacBio Revio SMRT cells with default parameters. The job is submitted via slurm to a compute node. First we thought this might be an issue related to RAM but when investigating, the maximum of used RAM is around 233Gb, whereas the allocated RAM is 900Gb. hifiasm-error6

Interestingly, when running an assembly for each data set of the both SMRT cells separately, there is no ERROR6 and others.

The problems unfortunately don't end here. When running a reference based annotation with TOGA (https://github.com/hillerlab/TOGA) the first haplotype looks reasonable good but the second haplotype is lacking around 1500 genes we are expecting. By investigating whole genome alignments, it seems that there is sequence actually missing in haplotype two.

I would be very happy, if you could have a look at this. As well we are open to share the data with you for further investigation.

Thank you very much in advance. Best, Tilman

chhylp123 commented 9 months ago

In general, I feel like these are all warnings so it doesn't matter too much. I guess there might be some other issues. If you could share the bin files with me, that would be very helpful. Thanks so much!

schellt commented 9 months ago

Thanks for offering to have a look at this. I sent you a mail to hcheng@jimmy.harvard.edu with details how to access the files.

chhylp123 commented 9 months ago

Thanks so much!

schellt commented 8 months ago

Dear @chhylp123 , did you had a chance to look at the bin files? Thank you. Best, Tilman

schellt commented 6 months ago

Dear @chhylp123, it would be great, if you could have a look at this. Find below some screenshots of example locations of the de novo assembled haplotype 1 with genes missing in haplotype 2. Both top figures are screenshots from an alignment to hg38 and below screenshots from IGV of corresponding regions in the assembled haplotype 1. It's unfortunately not possible to display the whole range of the regions given for the alignments in IGV. For me the coverage looks not suspicious here, which might point towards some assembly issue.

hg38 chr19:5,257,814-6,757,813 rougly corresponding to HLeliQue1A coords HAP1_SUPER_16:353,892-1,039,785 hap1_super16 IGV screenshot for HAP1_SUPER_16:674,008-719,670 image

chr19:54,345,842-54,560,522 corresponding to 1A HAP1_SUPER_19:64,995,898-65,223,886 hap1_super_19 IGV screenshot for HAP1_SUPER_19:65,089,716-65,130,068 image

Thank you very much in advance. Best, Tilman

MichaelHiller commented 6 months ago

Dear Heng @lh3 and developers,

we think this is really a HiFiasm bug that may have gone unnoticed, because people always expect the second haplotype is less complete as it lacks the sex chromosomes. However, in our case, we have autosomes and regions with many genes that have normal read coverage. Meaning, while the reads suggest that these autosomal regions should be present in both haplotypes, the haplotype 2 lacks them entirely.

Would be great if somebody could look into this. We are happy to share all data.

Thx a lot Michael