Open conchoecia opened 1 year ago
Hello @conchoecia,
Was your BAM file sorted by read names? The difference is expected if the BAM file was unsorted or sorted by coordinates, in which case the alignment quality scores were not considered. The default value 10 (-q option) is applied for BED input and BAM input sorted by read names. See Note 1.
It seems the majority (20X) of your HiC reads was not mapped. You might double-check to see if this is the case and probably good to figure out the reason.
The error seems to be a bug triggered by a corner case. Unfortunately, I do not have enough information to locate it.
Best, Chenxi
Hi @c-zhou, the bam was sorted by coordinates. I will change my pipeline to sort by reads next time (I used samtools sort
without -n
).
These aren't my reads or my assembly, so there isn't much I can do. I expect that it is due to the assembly quality being poor, and based on CLR rather than HiFi reads. I consistently notice that Hi-C mapping is poorer in CLR-based assemblies of the highly heterozygous species I work on. The Hi-C map looks copacetic though, so the Hi-C library construction seems fine.
Please let me know if you would like any files to suss out the bug. Thank you for maintaining this software! :)
Hi conchoecia: I'm not sure if this helps, but I had the same error when I used a BAM file that was made using a slightly different reference genome that the genome used in Yahs. My error went away once I used the correct reference genome in YAHS as the genome the BAM file was originally made from. Andreas
Hi @andreaschavez - Thanks for the suggestion. This wasn't the case, unfortunately, as I was using a pipeline that referenced the same fasta reference for both mapping and YAHS correction.
Hello @conchoecia,
Sorry for the delayed response. I was away in the past weeks.
I would appreciate it if you could share the data. The BIN file and the contig FASTA index file would be enough. I am not sure what the best way is for you to share them. My email address is cz370@cam.ac.uk if you want to discuss this by email.
Best, Chenxi
yahs version: v1.2a.2 chromap version: v0.2.4-r467
hello, if I use chromap to generate both a
.sam
and the.bed
file from the same Hi-C dataset, thenyahs
only works with the .bam
file.yahs
calculates the coverage of the.bam
file as 6x, and it calculates the coverage of the.bed
file is 2x. The actual coverage of the reads is 25x (Genome is ~500Mbp, there are ~50 million 2x150 read pairs).The crash in
yahs
with the.bed
file occurs at this coverage-calculating stage, and the log is pasted below. The command wasyahs -o test --no-contig-ec --no-scaffold-ec --no-mem-check myassembly.fasta my.bed
. Just to reiterate, when I ran this on a.bam
dataset everything worked and the run resulted in a scaffolded.fasta
.Please let me know if you would like more information!