PNNL-CompBio / Snekmer

Pipeline to apply encoded Kmer analysis to protein sequences
BSD 3-Clause "New" or "Revised" License
12 stars 1 forks source link

Specifying input directory causes PeriodicWildcardError #110

Open tnitka opened 1 year ago

tnitka commented 1 year ago

Running snekmer model while specifying the input directory in config.yaml with input_dir: ../../input or input_dir: ../input terminates while building the DAG of jobs with the error message:

PeriodicWildcardError in line 18 of /Users/nitk592/snekmer-dev/motif_test/Kchannel_small/src/snekmer/snekmer/rules/process.smk: The value .gz in wildcard uz is periodically repeated (FLT1.fasta.gz.gz.gz.gz.gz.gz.gz.gz.gz.gz.gz.gz.gz.gz.gz.gz.gz.gz.gz.gz). This would lead to an infinite recursion. To avoid this, e.g. restrict the wildcards in this rule to certain values.

This error doesn't occur when running snekmer cluster or when running model in a directory containing an output directory generated by cluster. Although the error involves a wildcard used only to unzip input files, it occurs even when there are no zipped input files.

tnitka commented 1 year ago

I tried running this in a different conda environment with Snekmer installed from main, and the problem still occurs.

christinehc commented 11 months ago

Will revisit this in a future update, but for now, to clarify-- Snekmer assumes existence of a directory named "input" in the specified input directory. Thus, specifying the additional "input" is unnecessary.

e.g. For this file structure:

dirname/
└- input/
     ├ A.fasta
     └ B.fasta

The proper input_dir is dirname/, not dirname/input.