Illumina / Pisces

Somatic and germline variant caller for amplicon data. Recommended caller for tumor-only workflows.
GNU General Public License v3.0
94 stars 16 forks source link

Scylla taking a very long time to process WES data / getting stuck #72

Open fmazzarotto opened 2 years ago

fmazzarotto commented 2 years ago

Dear Tamsen, I am using the Pisces suite to process tumor-only WES data. However, I am facing an issue with regards to Scylla as it often gets apparently stuck in a region of chromosome 3 that seems to be particularly complex. This happens in approximately 10-20% of the samples that I process. An example is provided in the attached screenshot of a sample that I have been processing for days now, where you can see that Scylla appears to be stuck since yesterday morning (26 hours ago) trying to resolve a 192-variants MNV on chr 3. Another sample is in the same situation (stuck in the same region of chr 3 on a 175-variants MNV since 2 days ago). Right now I am using Dotnet v5.0.408 and Pisces v5.3.0.0 - not sure if updating Dotnet can be of any help. I just wanted to check:

Screenshot from 2022-07-08 10-51-28

tamsen commented 2 years ago

Hi,

Thanks for your interest, and for switching to latest.

I think the combinatorics with your 192-variant is probably just causing a slow clustering problem. Thats a bummer that its hitting so many of your samples. How confident are you in all those variants? If its just noise, I would skip calling in that region, or pre-filter the variants before feeding them to Scylla, so you only spend time clustering true variants.

Another idea is to try fiddling with the clustering settings themselves (see https://github.com/tamsen/Pisces/wiki/Scylla-5.2.10-Design-Document) or run Scylla with no arguments, to see the list of exposed parameters.

Off the top of my head, I'd suggest changing the "dist" parameter from 50 to, say 10 or 5. Then variants have to be within 10 base pairs to cluster. So, you'd have more small clusters, instead of one big one, so less of a combinatoric compute problem. You could also constrain the cluster size with some small code changes.

(and yes, if you just remove Scylla, you will still get the same variant calls, just not organized into MNVs. So if you are OK with that result, just go ahead and remove Scylla)

best Tamsen