Closed bbimber closed 1 week ago
Hi @bbimber
Thanks for the report and for testing the new version! I'll have a look and try to fix it asap.
Hermann
@hermannromanek, yes you are right in your PR about that check already existing.
I forked your repo and was going to add some debugging. Do you have any suggestion on either checks or additional logging? I can easily re-run this on the problematic dataset: https://github.com/fritzsedlazeck/Sniffles/blob/debb998dc759fc76b2001d2c31c2bfe449e9a3c8/src/sniffles/result.py#L159
Would this be encountered if len(svcalls) == 1? In this case, sorting is irrelevant anyway, right?
Yes, I'm also thinking the problem is offset running out of bounds - can you try running the version in branch https://github.com/fritzsedlazeck/Sniffles/tree/issue520 i just pushed?
Although I'm still trying to also construct test cases to reproduce it this should fix it.
Thanks, Hermann
@hermannromanek: thank you for the fast fix - that did work.
I noticed one thing: the job left a huge number of files with names like "result-59198-6730-unsorted.part.vcf". Should these be deleted?
Here is the tail of the log, and it seems like sniffles2 finished normally. I dont see the words 'error' or 'exception' anywhere in the log, and nothing else that seemed like errors:
2024-11-10 11:34:28,926 INFO sniffles.worker (3596699): Worker 9 done (code 0).
2024-11-10 11:34:28,926 INFO sniffles.worker (3596699): Worker 10 done (code 0).
2024-11-10 11:34:28,926 INFO sniffles.worker (3596699): Worker 11 done (code 0).
2024-11-10 11:34:28,926 INFO sniffles.main (3596699): Took 5199.70s.
2024-11-10 11:34:28,926 INFO sniffles.main (3596699):
2024-11-10 11:36:09,818 INFO sniffles.main (3596699): Wrote 656835 called SVs to ./merge/PacBio.59.sniffles2.vcf (multi-sample, sorted)
The merge for big input data sets doesn't yet support sorting, so SVs that are far out of order are written to those extra files. So those contain actual merged variants. We currently run this pipeline with the --no-sort option and sort afterwards using bcftools, something we'll have to add to the release notes.
"Big input dataset" is controlled by the argument --combine-max-inmemory-results defaulting to 20, so any merge on more than this number of files will exhibit this behaviour.
I will also add a warning explaining this when running with sorting enabled on a number of input files that does not support it.
@hermannromanek: ok, I can understand that. Thanks for the investigation. A couple comments:
Hi @bbimber we will implement sorting for large datasets sometime in the future
OK. honestly it's really not that big a problem so long as the tool is clear about what it does or does not do. Low priority from our perspective.
Hello,
I installed the develop branch (sniffles 2.5) and am trying to call/merge 59 PacBio CCS and CLR datasets. I understand the dev branch is unable, but I was told you are about to release 2.5, which has some enhanced calling of large deletions. If my error is related to being on the unstable branch I can wait.
I first called each CRAM individually to generate snf files. I then ran sniffles as follows to merge them, where fileList.tsv has one line for snf file:
This gives the following error at the end of the log:
When I looked earlier int he log, I see this, which seems like it might be the actual problem (although maybe sniffles should die earlier if this happens):
Is this a known issue? Thanks for any help or ideas. The full log is attached:
snifflesMerge.txt