Open mnshgl0110 opened 2 years ago
Hi,
have you sorted and indexed also the smoothed.selective.bam
? If not, this can create some issues. But I'm not sure if sorting and indexing the .bam will solve yours.. Try it and let me know,
Moreover, have you run the example successfully?
Best, Luca
Hi Luca,
The test data finished without errors.
I re-sorted and re-indexed the smoothed.selective.bam
file, but the SVDSS search
command is still not finding any SFS and output the log message:
Processed batch 351. Reads so far 4110758. Reads per second: 17950. SFS extracted so far: 0. Batches exported: 0.
Also, as I mentioned in the original post, the SVDSS smooth
command is producing empty *_reads.txt
files. Maybe that provides some hint on what exactly is happening?
Best Manish
I see. Can you please provide the full log of the smoothing and searching steps?
Moreover, can you please check and report here the file size of the original bam and the one produced by the smoothing procedure?
Finally, in case I cannot figure out the issue, would it be possible to you sharing the data somehow?
Best, Luca
Hi Luca,
Here is the output from the smoothing and searching step:
(mgpy3.8) 18:18 goel@dell-node-12 netscratch:svdss$ SVDSS smooth --bam sample.bam --workdir . --reference GRCh38_latest_genomic.fna.gz --threads 60
Ping-pong, comparative genome analysis using sample-specific string detection in accurate long reads.
[I] .
[I] Loading reference genome from GRCh38_latest_genomic.fna.gz..
[I] Loading first batch..
[I] Processed batch 11. Reads so far 86747. Reads per second: 14457. Time: 6
[I] Processed bases: 1, num mismatch: 0, mismatch rate: 0, ignored reads: 0
[I] Done.
[I] Dumping smoothed read ids..
[I] Loaded 86747 reads.
[I] Wrote 86747 reads.
[I] Complete. Runtime: 35 seconds.
(mgpy3.8) 18:27 goel@dell-node-12 netscratch:svdss$ SVDSS search --threads 60 --assemble --index refgen.bwt --bam smoothed.selective.sorted.bam --workdir .
Ping-pong, comparative genome analysis using sample-specific string detection in accurate long reads.
[I] .
[I] Restoring index..
[I] Done.
[I] BAM input: smoothed.selective.sorted.bam
[I] Putative SFS extraction enabled.
[I] Loading smoothed read ids..
[I] Loaded 0 ignored read ids.
[I] Loaded 0 smoothed read ids.
[I] Loading first batch..
[I] Extracting SFS strings on 60 threads..
[I] Processed batch 9. Reads so far 86747. Reads per second: 21686. SFS extracted so far: 0. Batches exported: 1. Time: 4
[I] Done.
0 processed.
[I] Complete. Runtime: 8 seconds.
BAM size:
(mgpy3.8) 18:22 goel@dell-node-12 netscratch:svdss$ ll *bam
-rwxrwx--- 1 goel grp_schneeberger 639M Jun 9 12:20 sample.bam
-rwxrwx--- 1 goel grp_schneeberger 632M Jun 13 18:19 smoothed.selective.bam
Link to download the sample bam file: https://websafe.mpipz.mpg.de/d/0O5BZFBg6X/sample.bam
The reads are for human HG002.
Thanks Manish
Dear Manish, I fixed the issue. You can find the updated code in the new branch chrnamefix. Please try it and let me know.
Just a note: we had a check to filter unwanted contigs and retain only chromosomes sequences. But it works only with chr1/1 naming convention and not with RefSeq naming convention.. Therefore, without that filter all input contigs (i.e., alt or random contigs) are considered and may interfere with the analysis (in case some reads are mapped to them).
Best, Luca
Branch merged to master (check v1.0.4).
Closing since it works (at least on data I downloaded from ncbi following RefSeq naming conventions). Reopen please if error persists.
Hi, I'm kinda having the same problem. Testdata works fine. But not my PacBio dataset. Followed the snakefile script. Smoothed BAM is sorted and indexed. As I understand, this is most likely a naming mismatch? Thanks
Hi, could be :)
Is the smoothed bam non-empty? In the working directory you should find several files named solution_batch_{i}.assembled.sfs
, do you have those? Moreover, there should also be a file named smoothed_reads.txt
, is it empty? Can you please send the log of the snakemake execution (it should contain the stderr from SVDSS) - you should find it in the .snakemake
directory.
Just to be sure, can you also please check the (smoothed) .bam
header and the reference .fai
first column and send them here?
samtools view -H [.bam]
cut -f1 [.fai]
Hi, sorry for the late reply. This was a problem with my data. However, now it runs through but there is no .vcf file generated. What am I missing? Thanks for your help
SVDSS, Structural Variant Discovery from Sample-specific Strings. Mode: call [I] /projects/results/ [I] PingPong SV Caller running on 30 threads.. [I] Loading reference genome from /GRCh38_chr1-4.fa.. [I] Extracted chr1 with 230481012 bases. [I] Extracted chr2 with 240548228 bases. [I] Extracted chr3 with 198100135 bases. [I] Extracted chr4 with 189752667 bases. [I] Loaded all chromosomes. [I] Loading assmbled SFS.. [I] Loaded 1212 SFS strings on 124 reads. [I] Extending superstrings on 30 threads.. [I] Processed batch 1. Alignments so far: 9990. Alignments per second: 9990. Time: 1 [I] Done. [I] Extension complete. 249 clipped SFS.
It's strange since the log is incomplete.. This is an example of what you should see:
SVDSS, Structural Variant Discovery from Sample-specific Strings.
Mode: call
[I] svdss-output
[I] PingPong SV Caller running on 4 threads..
[I] Loading reference genome from input/22.fa..
[I] Extracted chr22 with 50818468 bases.
[I] Loaded all chromosomes.
[I] Loading assmbled SFS..
[I] Loaded 20884 SFS strings on 5402 reads.
[I] Extending superstrings on 4 threads..
[I] Processed batch 1. Alignments so far: 10000. Alignments per second: 10000. Time: 1
[I] Done.
[I] Extension complete. 2194 clipped SFS.
[I] Maximum extended-SFS length: 8800 bp. Using separation distance 9680.
[I] Retrieved 850 intervals.
[I] Flattening interval clusters..
[I] Analyzing 1567 clusters..
[I] Calling SVs from 1044 clusters..
[I] Extracted 957 SVs.
[I] 957 SVs before chain filtering.
[I] 956 SVs after chain filtering.
[I] Predicted 956 SVs from extended SFS.
[I] Outputting POA alignments to svdss-output/poa.sam..
[I] Exporting 956 SV calls to svdss-output/svs_poa.vcf..
[I] Complete. Runtime: 4 seconds.
Did it crash? If so, I'd expect some sort of error message.. But without that, it's not easy to troubleshoot the problem.. There is no .vcf
in the /projects/results/
folder?
I see that your log says
Processed batch 1. Alignments so far: 9990. Alignments per second: 9990. Time: 1
and not 10000.. but I'm not sure if that caused the problem.
I tried with a very small input and I still got 10000:
...
[I] Loaded all chromosomes.
[I] Loading assmbled SFS..
[I] Loaded 193 SFS strings on 28 reads.
[I] Extending superstrings on 4 threads..
[I] Processed batch 1. Alignments so far: 10000. Alignments per second: 10000. Time: 1
[I] Done.
...
Would it be possible to share the data you are using? This will help me a lot.
Hi,
I am trying to use SVDSS, but it seems that it is not working for me.
Here,
../../reads10kb.wm2.merged.bam
is sorted and indexed.The SVDSS smooth command finishes without any obvious error message and I get a
smoothed.selective.bam
file but thesmoothed_reads.txt
and theignored_reads.txt
files are empty. Running :produces an empty
solution_batch_0.assembled.sfs
file. Finally, running:results in segmentation fault. Could you please let me know how to fix this?
Best Manish