Closed alexvasilikop closed 1 year ago
I will look into this, and try to understand before Monday.
I actually think I had that issue earlier today (or at least had a similar issue)! I don't remember the exact steps I did to fix it, but I do remember that I ended up deleting the output directory (i.e. where you told Hairsplitter to send the output files), as well as navigated to where my input .fasta assembly was located and deleted a few files that had appeared there during Hairsplitter running (I think there was a .fasta.fai file and a file named "core.####", and I deleted both of them (the .fai file might have been from something else though, I'm unsure)).
I ran it again after that, and it seemed to progress past the "call_variants" step just fine. Sorry this isn't very specific for my potential "fix" - it was something I was doing in the middle of trying to troubleshoot something else, but it did seem to work (at least for me, or maybe it was coincidence).
Just so it's here, this is what I saw that led me to try the fix I proposed above:
*** Error in `/users/PAS1802/woodruff207/Hairsplitter/src/build/call_variants': double free or corruption (!prev): 0x0000000070de7800 ***
- Loading all reads from ../1_demul_adtrim/BC15.fastq in memory
- Loading all contigs from ../8_hairsplitter/tmp/cleaned_assembly.gfa in memory
- Loading alignments of the reads on the contigs from ../8_hairsplitter/tmp/reads_on_asm.sam
- Calling variants on each contig using basic pileup
/users/PAS1802/woodruff207/Hairsplitter/hairsplitter.py -f ../1_demul_adtrim/BC15.fastq -i 1376-haploid.fasta -x ont -o ../8_hairsplitter -t 28
HairSplitter v1.3.2 (github.com/RolandFaure/HairSplitter). Last update: 2023-08-11
******************
* *
* Hairsplitter *
* Welcome! *
* *
******************
===== STAGE 1: Cleaning graph of small contigs that are unconnected parts of haplotypes [ 2023-08-11 11:59:47.451158 ]
When the assemblers manage to locally phase the haplotypes, they sometimes assemble the alternative haplotype as a separate contig, unconnected in the gfa graph. This affects negatively the performance of Hairsplitter. Let's delete these contigs
- Mapping the assembly against itself
Running: /users/PAS1802/woodruff207/Hairsplitter/src/build/clean_graph ../8_hairsplitter/tmp/assembly.gfa ../8_hairsplitter/tmp/cleaned_assembly.gfa ../8_hairsplitter ../8_hairsplitter/hairsplitter.log 28 minimap2
- Eliminated small unconnected contigs that align on other contigs
===== STAGE 2: Aligning reads on the reference [ 2023-08-11 11:59:49.468419 ]
- Converting the assembly in fasta format
- Aligning the reads on the assembly
- Running minimap with command line:
minimap2 ../8_hairsplitter/tmp/cleaned_assembly.fasta ../1_demul_adtrim/BC15.fastq -x map-ont -a --secondary=no -t 28 > ../8_hairsplitter/tmp/reads_on_asm.sam 2> ../8_hairsplitter/tmp/logminimap.txt
The log of minimap2 can be found at ../8_hairsplitter/tmp/logminimap.txt
===== STAGE 3: Calling variants [ 2023-08-11 12:02:45.425662 ]
Running: /users/PAS1802/woodruff207/Hairsplitter/src/build/call_variants ../8_hairsplitter/tmp/cleaned_assembly.gfa ../1_demul_adtrim/BC15.fastq ../8_hairsplitter/tmp/reads_on_asm.sam 28 ../8_hairsplitter/tmp ../8_hairsplitter/tmp/error_rate.txt 0 ../8_hairsplitter/tmp/variants.col ../8_hairsplitter/tmp/variants.vcf
ERROR: call_variants failed. Was trying to run: /users/PAS1802/woodruff207/Hairsplitter/src/build/call_variants ../8_hairsplitter/tmp/cleaned_assembly.gfa ../1_demul_adtrim/BC15.fastq ../8_hairsplitter/tmp/reads_on_asm.sam 28 ../8_hairsplitter/tmp ../8_hairsplitter/tmp/error_rate.txt 0 ../8_hairsplitter/tmp/variants.col ../8_hairsplitter/tmp/variants.vcf
I tried with using the option -F (that overwrites the output directory) and for now the call_variants
command is running. Deleting the output directory did not work for me.
I will let you know if it runs successfully.
Update: The run failed even with the -F flag:
corrupted size vs. prev_size
Aborted (core dumped)
/mnt/sda1/Alex/software/Hairsplitter/hairsplitter.py -i /mnt/sda1/Alex/16.PHASED_ASSEMBLIES_HETEROZYGOSITY/Adineta_ricciae/Adineta_ricciae.chrom.interleaved.fasta -f /mnt/sda1/Alex/16.PHASED_ASSEMBLIES_HETEROZYGOSITY/Adineta_ricciae/Adineta_ricciae.ONT.BXQ_G.merged.filt.40000.90.1000.fastq -x ont -t 12 -o /mnt/sda1/Alex/16.PHASED_ASSEMBLIES_HETEROZYGOSITY/Adineta_ricciae/hairsplitter_aricciae_chrom -F
HairSplitter v1.3.2 (github.com/RolandFaure/HairSplitter). Last update: 2023-08-11
******************
* *
* Hairsplitter *
* Welcome! *
* *
******************
===== STAGE 1: Cleaning graph of small contigs that are unconnected parts of haplotypes [ 2023-08-14 11:04:13.549246 ]
When the assemblers manage to locally phase the haplotypes, they sometimes assemble the alternative haplotype as a separate contig, unconnected in the gfa graph. This affects negatively the performance of Hairsplitter. Let's delete these contigs
- Mapping the assembly against itself
Running: /mnt/sda1/Alex/software/Hairsplitter/src/build/clean_graph /mnt/sda1/Alex/16.PHASED_ASSEMBLIES_HETEROZYGOSITY/Adineta_ricciae/hairsplitter_aricciae_chrom/tmp/assembly.gfa /mnt/sda1/Alex/16.PHASED_ASSEMBLIES_HETEROZYGOSITY/Adineta_ricciae/hairsplitter_aricciae_chrom/tmp/cleaned_assembly.gfa /mnt/sda1/Alex/16.PHASED_ASSEMBLIES_HETEROZYGOSITY/Adineta_ricciae/hairsplitter_aricciae_chrom /mnt/sda1/Alex/16.PHASED_ASSEMBLIES_HETEROZYGOSITY/Adineta_ricciae/hairsplitter_aricciae_chrom/hairsplitter.log 12 minimap2
- Eliminated small unconnected contigs that align on other contigs
===== STAGE 2: Aligning reads on the reference [ 2023-08-14 11:04:32.704460 ]
- Converting the assembly in fasta format
- Aligning the reads on the assembly
- Running minimap with command line:
minimap2 /mnt/sda1/Alex/16.PHASED_ASSEMBLIES_HETEROZYGOSITY/Adineta_ricciae/hairsplitter_aricciae_chrom/tmp/cleaned_assembly.fasta /mnt/sda1/Alex/16.PHASED_ASSEMBLIES_HETEROZYGOSITY/Adineta_ricciae/Adineta_ricciae.ONT.BXQ_G.merged.filt.40000.90.1000.fastq -x map-ont -a --secondary=no -t 12 > /mnt/sda1/Alex/16.PHASED_ASSEMBLIES_HETEROZYGOSITY/Adineta_ricciae/hairsplitter_aricciae_chrom/tmp/reads_on_asm.sam 2> /mnt/sda1/Alex/16.PHASED_ASSEMBLIES_HETEROZYGOSITY/Adineta_ricciae/hairsplitter_aricciae_chrom/tmp/logminimap.txt
The log of minimap2 can be found at /mnt/sda1/Alex/16.PHASED_ASSEMBLIES_HETEROZYGOSITY/Adineta_ricciae/hairsplitter_aricciae_chrom/tmp/logminimap.txt
===== STAGE 3: Calling variants [ 2023-08-14 11:16:26.772307 ]
Running: /mnt/sda1/Alex/software/Hairsplitter/src/build/call_variants /mnt/sda1/Alex/16.PHASED_ASSEMBLIES_HETEROZYGOSITY/Adineta_ricciae/hairsplitter_aricciae_chrom/tmp/cleaned_assembly.gfa /mnt/sda1/Alex/16.PHASED_ASSEMBLIES_HETEROZYGOSITY/Adineta_ricciae/Adineta_ricciae.ONT.BXQ_G.merged.filt.40000.90.1000.fastq /mnt/sda1/Alex/16.PHASED_ASSEMBLIES_HETEROZYGOSITY/Adineta_ricciae/hairsplitter_aricciae_chrom/tmp/reads_on_asm.sam 12 /mnt/sda1/Alex/16.PHASED_ASSEMBLIES_HETEROZYGOSITY/Adineta_ricciae/hairsplitter_aricciae_chrom/tmp /mnt/sda1/Alex/16.PHASED_ASSEMBLIES_HETEROZYGOSITY/Adineta_ricciae/hairsplitter_aricciae_chrom/tmp/error_rate.txt 0 /mnt/sda1/Alex/16.PHASED_ASSEMBLIES_HETEROZYGOSITY/Adineta_ricciae/hairsplitter_aricciae_chrom/tmp/variants.col /mnt/sda1/Alex/16.PHASED_ASSEMBLIES_HETEROZYGOSITY/Adineta_ricciae/hairsplitter_aricciae_chrom/tmp/variants.vcf
ERROR: call_variants failed. Was trying to run: /mnt/sda1/Alex/software/Hairsplitter/src/build/call_variants /mnt/sda1/Alex/16.PHASED_ASSEMBLIES_HETEROZYGOSITY/Adineta_ricciae/hairsplitter_aricciae_chrom/tmp/cleaned_assembly.gfa /mnt/sda1/Alex/16.PHASED_ASSEMBLIES_HETEROZYGOSITY/Adineta_ricciae/Adineta_ricciae.ONT.BXQ_G.merged.filt.40000.90.1000.fastq /mnt/sda1/Alex/16.PHASED_ASSEMBLIES_HETEROZYGOSITY/Adineta_ricciae/hairsplitter_aricciae_chrom/tmp/reads_on_asm.sam 12 /mnt/sda1/Alex/16.PHASED_ASSEMBLIES_HETEROZYGOSITY/Adineta_ricciae/hairsplitter_aricciae_chrom/tmp /mnt/sda1/Alex/16.PHASED_ASSEMBLIES_HETEROZYGOSITY/Adineta_ricciae/hairsplitter_aricciae_chrom/tmp/error_rate.txt 0 /mnt/sda1/Alex/16.PHASED_ASSEMBLIES_HETEROZYGOSITY/Adineta_ricciae/hairsplitter_aricciae_chrom/tmp/variants.col /mnt/sda1/Alex/16.PHASED_ASSEMBLIES_HETEROZYGOSITY/Adineta_ricciae/hairsplitter_aricciae_chrom/tmp/variants.vcf
What happens when you run /mnt/sda1/Alex/software/Hairsplitter/src/build/call_variants /mnt/sda1/Alex/16.PHASED_ASSEMBLIES_HETEROZYGOSITY/Adineta_ricciae/hairsplitter_aricciae_chrom/tmp/cleaned_assembly.gfa /mnt/sda1/Alex/16.PHASED_ASSEMBLIES_HETEROZYGOSITY/Adineta_ricciae/Adineta_ricciae.ONT.BXQ_G.merged.filt.40000.90.1000.fastq /mnt/sda1/Alex/16.PHASED_ASSEMBLIES_HETEROZYGOSITY/Adineta_ricciae/hairsplitter_aricciae_chrom/tmp/reads_on_asm.sam 12 /mnt/sda1/Alex/16.PHASED_ASSEMBLIES_HETEROZYGOSITY/Adineta_ricciae/hairsplitter_aricciae_chrom/tmp /mnt/sda1/Alex/16.PHASED_ASSEMBLIES_HETEROZYGOSITY/Adineta_ricciae/hairsplitter_aricciae_chrom/tmp/error_rate.txt 0 /mnt/sda1/Alex/16.PHASED_ASSEMBLIES_HETEROZYGOSITY/Adineta_ricciae/hairsplitter_aricciae_chrom/tmp/variants.col /mnt/sda1/Alex/16.PHASED_ASSEMBLIES_HETEROZYGOSITY/Adineta_ricciae/hairsplitter_aricciae_chrom/tmp/variants.vcf
?
Here it is:
/mnt/sda1/Alex/software/Hairsplitter/src/build/call_variants /mnt/sda1/Alex/16.PHASED_ASSEMBLIES_HETEROZYGOSITY/Adineta_ricciae/hairsplitter_aricciae_chrom/tmp/cleaned_assembly.gfa /mnt/sda1/Alex/16.PHASED_ASSEMBLIES_HETEROZYGOSITY/Adineta_ricciae/Adineta_ricciae.ONT.BXQ_G.merged.filt.40000.90.1000.fastq /mnt/sda1/Alex/16.PHASED_ASSEMBLIES_HETEROZYGOSITY/Adineta_ricciae/hairsplitter_aricciae_chrom/tmp/reads_on_asm.sam 12 /mnt/sda1/Alex/16.PHASED_ASSEMBLIES_HETEROZYGOSITY/Adineta_ricciae/hairsplitter_aricciae_chrom/tmp /mnt/sda1/Alex/16.PHASED_ASSEMBLIES_HETEROZYGOSITY/Adineta_ricciae/hairsplitter_aricciae_chrom/tmp/error_rate.txt 0 /mnt/sda1/Alex/16.PHASED_ASSEMBLIES_HETEROZYGOSITY/Adineta_ricciae/hairsplitter_aricciae_chrom/tmp/variants.col /mnt/sda1/Alex/16.PHASED_ASSEMBLIES_HETEROZYGOSITY/Adineta_ricciae/hairsplitter_aricciae_chrom/tmp/variants.vcf
- Loading all reads from /mnt/sda1/Alex/16.PHASED_ASSEMBLIES_HETEROZYGOSITY/Adineta_ricciae/Adineta_ricciae.ONT.BXQ_G.merged.filt.40000.90.1000.fastq in memory
- Loading all contigs from /mnt/sda1/Alex/16.PHASED_ASSEMBLIES_HETEROZYGOSITY/Adineta_ricciae/hairsplitter_aricciae_chrom/tmp/cleaned_assembly.gfa in memory
- Loading alignments of the reads on the contigs from /mnt/sda1/Alex/16.PHASED_ASSEMBLIES_HETEROZYGOSITY/Adineta_ricciae/hairsplitter_aricciae_chrom/tmp/reads_on_asm.sam
- Calling variants on each contig using basic pileup
double free or corruption (!prev)
[1] 27204 abort (core dumped) /mnt/sda1/Alex/software/Hairsplitter/src/build/call_variants 12 0
Does it fail when you disable multithreading ? (try running /mnt/sda1/Alex/software/Hairsplitter/src/build/call_variants /mnt/sda1/Alex/16.PHASED_ASSEMBLIES_HETEROZYGOSITY/Adineta_ricciae/hairsplitter_aricciae_chrom/tmp/cleaned_assembly.gfa /mnt/sda1/Alex/16.PHASED_ASSEMBLIES_HETEROZYGOSITY/Adineta_ricciae/Adineta_ricciae.ONT.BXQ_G.merged.filt.40000.90.1000.fastq /mnt/sda1/Alex/16.PHASED_ASSEMBLIES_HETEROZYGOSITY/Adineta_ricciae/hairsplitter_aricciae_chrom/tmp/reads_on_asm.sam 1 /mnt/sda1/Alex/16.PHASED_ASSEMBLIES_HETEROZYGOSITY/Adineta_ricciae/hairsplitter_aricciae_chrom/tmp /mnt/sda1/Alex/16.PHASED_ASSEMBLIES_HETEROZYGOSITY/Adineta_ricciae/hairsplitter_aricciae_chrom/tmp/error_rate.txt 0 /mnt/sda1/Alex/16.PHASED_ASSEMBLIES_HETEROZYGOSITY/Adineta_ricciae/hairsplitter_aricciae_chrom/tmp/variants.col /mnt/sda1/Alex/16.PHASED_ASSEMBLIES_HETEROZYGOSITY/Adineta_ricciae/hairsplitter_aricciae_chrom/tmp/variants.vcf
)
Ok, I'm getting close to the problem. It is a problem of multithreading, it should not happen if you launch HairSplitter with one thread. Keeping you up to date once I actually corrected it.
The problem has been corrected in version 1.3.3. Thank you and don't hesitate if you come across any other bugs !
Hi Roland,
I am using version 1.3.4 and I am still getting the warning for some reads not being in the sam file:
WARNING: read in the sam file not found in reads file, ignoring: ch219_read44339_template_pass_FAK89779_1-47182
WARNING: read in the sam file not found in reads file, ignoring: ch135_read28305_template_pass_FAK89779_1-100239
WARNING: read in the sam file not found in reads file, ignoring: ch219_read43952_template_pass_FAK89779_10-42516
WARNING: read in the sam file not found in reads file, ignoring: ch393_read23811_template_pass_FAK89779_3-50636
WARNING: read in the sam file not found in reads file, ignoring: ch86_read40397_template_fail_FAK89779_17-72693
WARNING: read in the sam file not found in reads file, ignoring: ch412_read47033_template_pass_FAK89779_7-83775
WARNING: read in the sam file not found in reads file, ignoring: ch50_read44397_template_pass_FAK89779_9-51208
WARNING: read in the sam file not found in reads file, ignoring: ch248_read30930_template_pass_FAK89779_53-44748
WARNING: read in the sam file not found in reads file, ignoring: ch248_read31320_template_pass_FAK89779
WARNING: read in the sam file not found in reads file, ignoring: ch16_read26613_template_pass_FAK89779_17-72851
WARNING: read in the sam file not found in reads file, ignoring: ch118_read33696_template_pass_FAK89779_4-69317
WARNING: read in the sam file not found in reads file, ignoring: ch393_read24189_template_pass_FAK89779_1-44962
WARNING: read in the sam file not found in reads file, ignoring: ch412_read48333_template_pass_FAK89779_6-46862
WARNING: read in the sam file not found in reads file, ignoring: ch412_read47329_template_pass_FAK89779_15-76190
WARNING: read in the sam file not found in reads file, ignoring: ch322_read47295_template_fail_FAK89779_7-54714
WARNING: read in the sam file not found in reads file, ignoring: ch294_read28016_template_pass_FAK89779_2-62700
WARNING: read in the sam file not found in reads file, ignoring: ch79_read40570_template_fail_FAK89779_1-44400
WARNING: read in the sam file not found in reads file, ignoring: ch235_read17025_template_pass_FAK89779_3-54048
WARNING: read in the sam file not found in reads file, ignoring: ch451_read25548_template_pass_FAK89779_15-67538
WARNING: read in the sam file not found in reads file, ignoring: ch248_read32169_template_pass_FAK89779
Was this corrected in v.1.3.4? Thanks
Hum, I do not remember. Have you looked at these reads and are they found in the fastq file ? If yes, you can send me the files. You can probably reproduce the issue with smaller files containing only read ch219_read44339_template_pass_FAK89779_1-47182
for example
The problem was that I used a zipped fastq file as input. When I unzipped and rerun hairsplitter the problem disappeared.
Thanks
Hi Roland,
I still haven't been able to run the pipeline successfully. I am getting some error that the variant calling pipeline failed and also the warnings about the headers of the reads again. Please have a look: