Isabella136 / AmrPlusPlus_SNP

GNU General Public License v3.0
1 stars 1 forks source link

Issue with large samples #28

Closed EnriqueDoster closed 1 year ago

EnriqueDoster commented 1 year ago

Hello @Isabella136,

I'm trying to run the SNPConfirmation.py script on my samples as part of the AMR++ pipeline, but 4/200 samples could not be completed. I pulled those samples and tried running the SNP confirmation separately, but I've now been running the analysis on this sample using 12 threads and it's been going for almost 4 days. I used htop to review the computing resources being used and the process doesn't seem to be taking up a lot of resources. I know it's challenging to optimize this sort of thing, but I was hoping you had some tips on speeding up the analysis? Thoughts?

Thanks, Enrique

Isabella136 commented 1 year ago

Can you also share the count matrix for this sample with me? Perhaps it could help me figure out how to best optimize the script, or if there is a logic error causing an infinite loop.

EnriqueDoster commented 1 year ago

Sure thing, here it is!

Thanks for your quick response! AMR_analytic_matrix.csv

Isabella136 commented 1 year ago

So for this sample in particular, the problem had nothing to do with it being large; instead there was a logic error which ended up causing a key error later down the line. The newest commit addresses this problem and runs the sample without problem.

Let me know if the other three samples also causes problems

EnriqueDoster commented 1 year ago

Awesome, that worked for this sample and the other 5 samples with this issue. Thank you!