Migration to use Snakemake

bebatut commented 7 years ago

Hi,

I migrated the code to use Snakemake. An HTML report is generated at the end of the workflow with a summary of the results.

@keuv-grvl Can you check? I run now the similarity search on all possible sequences (not on each CDS individually). I checked for candidatus_methanomethylophilus_alvus_Mx1201: I have the same results as before the migration (hopefully).

Bérénice

keuv-grvl commented 7 years ago

One of my concern is the future of a proper command line interface. As far as I understand, user-defined arguments are located in config.yaml.

@ylana is currently listing all user-related parameters (input genome, database location, number of threads to use, etc.). Also, she is writing what the manual should look like.

We must keep the possibility to call the pipeline from command line in a Unix-style, eg:

./PylProtPredictor
  --input data/genome.fasta
  --db /path/to/uniref90.dmnd
  --output test_output/
  --threads 8
  --whatever the_argument
  --run all # or 'predict_cds', or any snakemake rule

For the moment, Snakemake allows overriding the config file but it is not really Unix-compliant:

snakemake --config yourparam=1.5

One solution could be writing a wrapper script to properly launch the pipeline (mostly setting up the config.yaml according to user-defined arguments). On an other hand, using argparse to manage arguments and manual should make the software easier to maintain and being compatible with Gooey.

keuv-grvl commented 7 years ago

Seems good to me now :tada: Let's merge @bebatut ?

bebatut commented 7 years ago

Yes !

bebatut / PylProtPredictor

Migration to use Snakemake #3