dfguan / purge_dups

haplotypic duplication identification tool
MIT License
202 stars 19 forks source link

Segmentation fault (core dumped) with get_seqs #85

Open AFrolicOfFerns opened 3 years ago

AFrolicOfFerns commented 3 years ago

Hello, I am trying to use purge_dupes on my Nanopore-based genome assembly. Heterozygosity is estimated (using GenomeScope and Illumina data) to be ~3% in this organism, which has a 1C genome size of 1.27Gb. The commands I used are included below - everything ran fine except for get_seqs, which triggered the error message:

Segmentation fault (core dumped) get_seqs dups.bed $pri_asm > purged.fa 2> hap.fa

Commands used:

module load minimap2/2.17.Py3
module load PurgeDups/1.2.5.Py3

pri_asm=/data/run/ejennings/pilon1_pd_purged/pd_purged_pilon_edit.fasta
ont_reads=/data/run/ejennings/gt10Kb_clean_porechop_filtlong.fastq.gz

minimap2 -t 12 -I 100G -x map-ont $pri_asm $ont_reads | gzip -c - > pd_pilon_purged_dups.paf.gz # Re-trying with the -I flag for indexing, as the genome is large

pbcstat pd_pilon_purged_dups.paf.gz

calcuts PB.stat > cutoffs 2> calcults.log

split_fa $pri_asm > pd_purged_pilon.split

minimap2 -t 12 -I 100G -x asm5 -DP pd_purged_pilon.split pd_purged_pilon.split | gzip -c - > pd_purged_pilon.split.self.paf.gz

purge_dups -2 -T cutoffs -c PB.base.cov pd_purged_pilon.split.self.paf.gz > dups.bed 2> purge_dups.log

get_seqs -e dups.bed $pri_asm > purged.fa 2> hap.fa

And here is the coverage plot, if that helps: PB covV2

It looks to me like hap.fa and purged.fa are started correctly, but they are not completed/ have very small file sizes. Any idea what's going on here? Thanks!

fishercera commented 3 years ago

I am also having a repeated problem with segmentation fault occurring, but for me, it occurs at purge_dups instead of after get_seqs has started.

CKR_both_canu v1 purge_dups

FWIW, my genome size is about the same as @AFrolicOfFerns (1.27GB, Aedes aegypti in this case) and heterozygosity might be quite high -- I do not have an estimate. I'm working on getting that.

I have tried running this on two different compute systems, and gotten a segmentation fault on both of them. Is it running out of memory? Or something like that?

fishercera commented 3 years ago

Hi there, this is still happening to me on a different assembly. I get "segmentation fault (core dumped) during purge_dups and the dups.bed file does not get written.

Heterozygosity is estimated by kmer analysis to be 1.6% in this species. Repetitive DNA is 65% of the genome.

Here is the coverage graph from the current assembly that this happened on: RK_F3_canu purge_dups

And here is the error log from purge_dups: [M::main] finish parsing params [M::main] finish reading hits [M::main] finish reading cutoffs [M::main] finish reading coverages

System: Dell OptiPlex 5070 32 Gb RAM 8-core CPU (9th gen intel core i7-9700 @ 3.00Ghz)

Operating system: Ubuntu 20.04 on Windows Subsystem for Linux 2 - Windows 10

AFrolicOfFerns commented 2 years ago

@fishercera I switched to using pseudohaploid and it worked quickly with no issues. I never figured out what was going on here!

jesssicasolis commented 1 month ago

Hi @fishercera did you end up switching to pseudo haploid as well or were you able to solve the issue as well?