Murali-group / Beeline

BEELINE: evaluation of algorithms for gene regulatory network inference
GNU General Public License v3.0
171 stars 53 forks source link

Test multiple parameters for the same method? #59

Closed ekernf01 closed 2 years ago

ekernf01 commented 2 years ago

If I use a configuration file that has, under algorithms:, two different settings of some nuisance parameter as below, then it seems that BEELINE will run them both, but will overwrite the previous results on the second time.

        - name: "PPCOR"
          params: 
              should_run: [True]
              # Used in parsing output
              pVal: [0.05]

        - name: "PPCOR"
          params: 
              should_run: [True]
              # Used in parsing output
              pVal: [0.01]

I considered a couple of solutions but both require detailed knowledge of the BEELINE implementation. The script runPPCOR.R could be changed to put output in different files, but then the evaluator will not know where to look for it. The name of the method could be changed as below, but then many other files must be nearly-duplicated: Dockerfiles, within-container runner scripts, outside-container Python scripts in BLrun, the mappers and parsers in runner.py, and probably more besides. Do you have a standard or recommended solution for this? Thank you!

        - name: "PPCOR_0.05"
          params: 
              should_run: [True]
              # Used in parsing output
              pVal: [0.05]

        - name: "PPCOR_0.01"
          params: 
              should_run: [True]
              # Used in parsing output
              pVal: [0.01]
adyprat commented 2 years ago

You're right. While the config file can take in multiple parameters for the same algorithm, for example,

        - name: "PPCOR"
          params: 
              should_run: [True]
              # Used in parsing output
              pVal: [0.05, 0.01]

currently, it overwrites the output files.

Your suggestion for creating multiple names in config file will actually work pretty well, without having to change the Dockerfiles/evaluation scripts.

Here's what needs to change, assuming underscore is the delimiter between the name and run ID, for example PPCOR_1, PPCOR_2, etc. i.e., the config file will be:

        - name: "PPCOR_1"
          params: 
              should_run: [True]
              # p-value cutoff
              # Used in parsing output
              pVal: [0.01]

        - name: "PPCOR_2"
          params: 
              should_run: [True]
              # p-value cutoff
              # Used in parsing output
              pVal: [0.05]

The class Runner in runner.py will need to change to:

class Runner(object):
    '''
    A runnable analysis to be incorporated into the pipeline
    '''
    def __init__(self,
                params):
        self.name = params['name']
        self.inputDir = params['inputDir']
        self.params = params['params']
        self.exprData = params['exprData']
        self.cellData = params['cellData']

    def generateInputs(self):
        # if no underscore is present, this will act like usual
        InputMapper[self.name.split('_')[0]](self) 

    def run(self):
        AlgorithmMapper[self.name.split('_')[0]](self)

    def parseOutput(self):
        OutputParser[self.name.split('_')[0]](self)

Finally, the outDir parameter in ppcorRunner.py (and others) also needs to be updated to:

outDir = "outputs/"+str(RunnerObj.inputDir).split("inputs/")[1]+f"/{RunnerObj.name}/"

in run() and parseOutput() functions. The evaluation scripts should work without any further changes.

At the moment, these changes are the least disruptive, imo, while we look into a more permanent solution for this. This will also work with the current setup where there's only one parameter set for each method.

I must note, some algorithms like Sincerities, which use 'params' in generateInputs() function might have some issues with these changes. In that, it might overwrite input csv files while running the same algorithm with a new set of parameters. But, so long as one run of the same algorithm is completed before the other begins, it should work without any issues.

ekernf01 commented 2 years ago

Great! Thanks! I will try this and get back to you if I have any trouble.