biomedicalinformaticsgroup / Sargasso

Sargasso disambiguates mixed-species high-throughput sequencing data.
http://biomedicalinformaticsgroup.github.io/Sargasso/
Other
8 stars 4 forks source link

add debug option to follow specific reads through the filter process #38

Closed s-heron closed 5 years ago

hxin commented 6 years ago

The problem of adding code to log debug information is that this code gets checked everytime a reads is processed, thus will potentially slow down the code.

After a bit of testing, using a boolean flag and to check it before going into the logging code, is the fastest we can get. The checking of the boolean alone take around 18 seconds for a loop of 184652815, which is the number of reads from a real-world bam file. Thus, when not running in debug mode, this should only increase the run time by a scale of seconds.

However, the writing of the log(IO) is expensive, thus it seems impractical to run Sargasso in debug mode for a large real-world dataset. This needs to be future evaluated.

We decided to give it a try.


import logging

logging.basicConfig(filename='example.log', filemode='w', level=logging.INFO)

def logNo():
    logging.debug('a!')

def logYes():
    logging.debug('a!')

def logIfT():
    if(bar):
        logging.debug('a!')
def logIfTand1():
    if(bar & 1):
        logging.debug('a!')

def logIfF():
    if(bar):
        logging.debug('a!')

def logIfFInfo():
    if(bar):
        logging.info('a!')

n=184652815
print('logNo:')
print(timeit.timeit(logNo,number=n))
print('logYes:')
print(timeit.timeit(logYes,number=n))
print('logIfT:')
print(timeit.timeit(logIfT,number=n))
print('logIfTand1:')
print(timeit.timeit(logIfTand1,number=n))
bar=True
bar=False
print('logIfF: false')
print(timeit.timeit(logIfF,number=n))
bar=True
print('logIfF: true no IO')
print(timeit.timeit(logIfF,number=n))

print('logIfF: true IO')
print(timeit.timeit(logIfFInfo,number=n))```
hxin commented 5 years ago

This is now implemented in another package https://github.com/sidbdri/Sargasso_testsuite. The actual implementation is a bit tricky.

The Sargasso package now has a test folder which contains a test script test_filtering_logic.py.

The Sargasso_testsuite package has a script to generate a bam file containing specific read(s) and save them in the Sargasso package test folder, which can be used by Sargasso unit test.

So to be able to do step-by-step debug, use Sargasso_testsuite first to create the test data, then modify test_filtering_logic.py to load the test data and start the debugging.