Closed s-heron closed 5 years ago
This is now implemented in another package https://github.com/sidbdri/Sargasso_testsuite. The actual implementation is a bit tricky.
The Sargasso package now has a test folder which contains a test script test_filtering_logic.py.
The Sargasso_testsuite package has a script to generate a bam file containing specific read(s) and save them in the Sargasso package test folder, which can be used by Sargasso unit test.
So to be able to do step-by-step debug, use Sargasso_testsuite first to create the test data, then modify test_filtering_logic.py to load the test data and start the debugging.
The problem of adding code to log debug information is that this code gets checked everytime a reads is processed, thus will potentially slow down the code.
After a bit of testing, using a boolean flag and to check it before going into the logging code, is the fastest we can get. The checking of the boolean alone take around 18 seconds for a loop of 184652815, which is the number of reads from a real-world bam file. Thus, when not running in debug mode, this should only increase the run time by a scale of seconds.
However, the writing of the log(IO) is expensive, thus it seems impractical to run Sargasso in debug mode for a large real-world dataset. This needs to be future evaluated.
We decided to give it a try.