Parsoa / SVDSS

Improved structural variant discovery in accurate long reads using sample-specific strings (SFS)
MIT License
42 stars 4 forks source link

Segmentation fault while searching SFS between two assemblies #21

Open LYC-vio opened 1 year ago

LYC-vio commented 1 year ago

Hi,

Thank you for developing this amazing tool. Recently, I've tried to use SVDSS to find SFSs between two assemblies (or an assembly and the reference genome), however, SVDSS failed in the search step with a segmentation fault.

The reference I used was hg19-2.1.0, and the assembly was downloaded from here, HG02080.paternal.f1_assembly_v2.fa.gz.

the command I used were:

${SVDSS} index --fasta ${ref} --index ${idx_file} -b --threads 10
${SVDSS} search --index ${idx_file} --fastq ${HG02080_assembly} --workdir ./work_dir --threads 10

What was the possible reason for this issue? By the way, I'm using the v1.0.5 binary

Thank you

LYC-vio commented 1 year ago

After removing -b, SVDSS successfully extracted SFSs. But it is kind of strange it doesn't work with -b

ldenti commented 1 year ago

Hi, yeah, you got the issue! With -b,--binary, the index is stored in binary format but this type of index is not queriable (actually I don't remember if this is required by the ropebwt2 implementation we are based on or not). However, this is useful when you need to do an incremental construction of the index (e.g., you want to index more fasta/q files but without concatening them)

LYC-vio commented 1 year ago

Hi @ldenti ,

Thank you! What should I do if i want to search on a binary output?

like:

${SVDSS} index --fasta ${ref} --index ${idx_file} -b --threads 10

for i in {asm1} {asm2} {asm3} {asm4}
do
    ${SVDSS} index --fasta ${i} --append ${idx_file} --threads 10
done 

${SVDSS} search --index ${idx_file} --fastq ${HG02080_assembly} --workdir ./work_dir --threads 10

Is this the right way to do it?

Thanks again

ldenti commented 1 year ago

You need to create the binary index at each iteration except the last one, where you store the index in FMD format:

${SVDSS} index --fasta ${ref} --index ${idx_file} --binary
for i in {asm1} {asm2} {asm3} 
do
    ${SVDSS} index --fasta ${i} --append ${idx_file} --binary --index ${idx_file}.tmp
    mv ${idx_file}.tmp ${idx_file}
done
${SVDSS} index --fasta {asm4} --append ${idx_file} --index ${idx_file}.fmd

then you can search against the index stored in fmd (and not binary):

${SVDSS} search --index ${idx_file}.fmd --fastq ${HG02080_assembly} --workdir ./work_dir --threads 10

Let me know if this works