dib-lab / elvers

(formerly eelpond) an automated RNA-Seq workflow system
https://dib-lab.github.io/elvers/
Other
28 stars 3 forks source link

[MRG] Add rule testing framework #99

Closed bluegenes closed 5 years ago

bluegenes commented 5 years ago

Basic rule testing framework is now implemented in tests/test_rules.py. This PR also includes some associated updates to params, env filesnames, etc and the assemblyinput fix from #96.

If end up thinking this is the wrong strategy, there's non-working previous attempt in alt_rules_testing.py, where I initially tried to test each rule independently of run_eelpond. The most annoying issue is that the samples dataframe and is_single_end function need to be available to all rules (this is currently handled within the main Snakefile). However, I was also doing essentially doing a lightweight version of all of the same functions that are within run_eelpond, and this seemed like a poor plan.

bluegenes commented 5 years ago

status: Implemented testing via run_eelpond instead of doing all parameter handling separately in a test function. pytest -k salmon working now.

charlesreid1 commented 5 years ago

Note that there is no requirements.txt in this branch, so I added the one from the make_package branch. That allows me to install pytest/snakemake/etc into a virtual environment to run tests.

When I run pytest -k salmon, the first test passes, but the second one never finishes. If I am interpreting the traceback from pytest correctly, the test that's hanging is:

../vp/lib/python3.6/site-packages/_pytest/python.py:166:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

    def test_salmon_long():
>        run_ruletest('salmon', 'test', {}, short=False)

test_rules.py:62:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

rulename = 'salmon', testdir = 'test', extra_configD = {}, short = False

    def run_ruletest(rulename, testdir, extra_configD = {}, short = True): # can we pass in rulename, paramsD here, testdata, short yes/no?
        """ test a rule or workflow"""
        # set up dirs
        homedir = os.path.dirname(here)
        run_eelpond_cmd = os.path.join(homedir, 'run_eelpond')
        conda_prefix = os.path.join(homedir, '.snakemake')

        # test info from rule
        rulefile = glob.glob(os.path.join(homedir, 'rules', '*', rulename + '.rule'))[0]
        ruledir = os.path.dirname(rulefile)
        test_yml = glob.glob(os.path.join(ruledir, testdir,'*.yml'))[0]

        with TempDirectory() as location:
            # copy in test data
            #tmpdir = os.path.join(location, testdir)
            #shutil.copytree(os.path.join(ruledir, testdir), tmpdir)
            #test_yml = glob.glob(os.path.join(tmpdir, '*.yml'))[0]

            ## or, avoid copying data this way: ##
            os.chdir(os.path.join(ruledir, testdir))
            # need to be here in order to properly find any relative assemblyinput, etc paths. Maybe fix this in run_eelpond to get path relative to file
            cmd = [run_eelpond_cmd, test_yml, rulename, 'get_data', '--conda_prefix', conda_prefix, '--out_path', location]
            ## NOTE: get data should be added automatically if we need the reads, via high-level input checks. Take out of here once that is implemented issue #110
            # short tests just do dryrun
            if short:
                cmd.append('-n')
            if extra_configD:
                extra_yml = os.path.join(location, 'extra.yml')
                write_yaml(extra_configD, extra_yml)
                cmd.extend(['--extra_config', extra_yml])
                #cmd.extend(['--config_dict', extra_configD]) # not working via pytest
            subprocess.check_call(["snakemake", "--version"])
            try:
>               subprocess.check_call(cmd) # run the command!

test_rules.py:47:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

popenargs = (['/temp/eelpond/run_eelpond', '/temp/eelpond/rules/salmon/test/test.yml', 'salmon', 'get_data', '--conda_prefix', '/temp/eelpond/.snakemake', ...],)
kwargs = {}

    def check_call(*popenargs, **kwargs):
        """Run command with arguments.  Wait for command to complete.  If
        the exit code was zero then return, otherwise raise
        CalledProcessError.  The CalledProcessError object will have the
        return code in the returncode attribute.

        The arguments are the same as for the call function.  Example:

        check_call(["ls", "-l"])
        """
>       retcode = call(*popenargs, **kwargs)
charlesreid1 commented 5 years ago

(On second thought, maybe the test just seems like it's hanging because it's taking a really long time to download fasta data?)

bluegenes commented 5 years ago

@charlesreid1 just a note: I know there's more to do on testing, primarily adding command-line checks to see if the params are actually getting passed in. But I also made a number of changes needed for other PRs, e.g changing the names of the params.yml and environment.yml files, so I just want a review on the framework, and I'll open another PR to finish out this testing

bluegenes commented 5 years ago

@charlesreid1 yeah the long test grabs reads (not downloaded) and runs salmon - it will take a while. Definitely need to categorize them: we really want to run only the short tests most of the time, then long tests when adding new programs, etc

bluegenes commented 5 years ago

Porting this over from discussion on #96. The idea for avoiding large or repetitive test data: Change test data location: Make a single test dir rules/test, with trivial fq.gz and .fa files. write these into short_test.yml. Then upload larger test data, and provide a long_test.yml with a samples.tsv containing links to the larger data. When testing, include short or long yml as the main config or as an --extra_config. There will be a few rules that require alternate test data (e.g. multiqc needs _fastqc.html or other qc output files). Maybe keep these files within rule-specific dirs.

ctb commented 5 years ago

LGTM as well - this only adds a salmon test, right?

bluegenes commented 5 years ago

yep - just wanted to get the testing framework reviewed before I added all. Also, I ended up making a number of other file changes (params, etc) that I need in the other PR's. Going to merge and add tests for all rules in a separate PR!