Open Orz-CQ opened 8 months ago
Hi @akcorut and happy new year!
Here is my suggestion, could you add a function that export all the processes into a bash file?
for example, if I used the same test,
kgwasflow test -t 5 --conda-frontend mamba
it will generate a bash file and we could run this pipeline by single run as bash XX.sh
.
Moreover, could you combine all these required environments into a single conda yaml?
Lan
Hello @akcorut,
Bumping this thread because I'm also experiencing the same issue with a test run on the ecoli dataset. I tried running it with
kgwasflow test -t 16 --snake-default
Dry run performs correctly.
@Orz-CQ, did you ever solve the issue, yourself?
I'm also attaching the full log file 2024-03-01T094921.431143.snakemake.log
Thanks in advance for your help!
> [Fri Mar 1 10:17:55 2024]
Job 587: Merging outputs from two KMC k-mers counting results into one list for each sample/individual...
Reason: Missing output files: results/kmers_count/individual_81/kmers_with_strand
Activating conda environment: .snakemake/conda/c6c38832695d6d6755994dd3624fff4b_
Error: flag 00 should be equal to zero.
This is likely due to running the KMC non-canonized with -ci not 1
Error: flag 00 should be equal to zero.
This is likely due to running the KMC non-canonized with -ci not 1
[Fri Mar 1 10:17:56 2024]
Error in rule merge_kmers:
jobid: 590
input: results/kmers_count/individual_84/output_kmc_canon.kmc_suf, results/kmers_count/individual_84/output_kmc_canon.kmc_pre, results/kmers_count/individual_84/output_kmc_all.kmc_suf, results/kmers_count/individual_84/output_kmc_all.kmc_pre, results/kmers_count/individual_84/kmc_canonical.done, results/kmers_count/individual_84/kmc_non-canonical.done, scripts/external/kmers_gwas/bin
output: results/kmers_count/individual_84/kmers_with_strand, results/kmers_count/individual_84/kmers_add_strand_information.done
log: logs/count_kmers/kmc/individual_84/add_strand.log.out (check log file(s) for error details)
conda-env: /scratch/bcama/my_directory/kgwas/.snakemake/conda/c6c38832695d6d6755994dd3624fff4b_
shell:
export LD_LIBRARY_PATH=$CONDA_PREFIX/lib
scripts/external/kmers_gwas/bin/kmers_add_strand_information -c results/kmers_count/individual_84/output_kmc_canon -n results/kmers_count/individual_84/output_kmc_all -k 25 -o results/kmers_count/individual_84/kmers_with_strand > logs/count_kmers/kmc/individual_84/add_strand.log.out
(one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)
[Fri Mar 1 10:17:56 2024]
Error in rule merge_kmers:
jobid: 587
input: results/kmers_count/individual_81/output_kmc_canon.kmc_suf, results/kmers_count/individual_81/output_kmc_canon.kmc_pre, results/kmers_count/individual_81/output_kmc_all.kmc_suf, results/kmers_count/individual_81/output_kmc_all.kmc_pre, results/kmers_count/individual_81/kmc_canonical.done, results/kmers_count/individual_81/kmc_non-canonical.done, scripts/external/kmers_gwas/bin
output: results/kmers_count/individual_81/kmers_with_strand, results/kmers_count/individual_81/kmers_add_strand_information.done
log: logs/count_kmers/kmc/individual_81/add_strand.log.out (check log file(s) for error details)
conda-env: /scratch/bcama/my_directory/kgwas/.snakemake/conda/c6c38832695d6d6755994dd3624fff4b_
shell:
export LD_LIBRARY_PATH=$CONDA_PREFIX/lib
scripts/external/kmers_gwas/bin/kmers_add_strand_information -c results/kmers_count/individual_81/output_kmc_canon -n results/kmers_count/individual_81/output_kmc_all -k 25 -o results/kmers_count/individual_81/kmers_with_strand > logs/count_kmers/kmc/individual_81/add_strand.log.out
(one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
Error! The Snakemake workflow aborted.
Hello! I ran into the same issue and was wondering anyone solved this issue yet?
The error comes from the kmer GWAS code found here https://github.com/voichek/kmersGWAS/blob/master/src/kmers_add_strand_information.cpp. In the kmer GWAS manual it clearly says to run the non-canonized with -ci0
, so I'm not sure what's going on with the error message here.
When I'm looking at the log files for the preceding steps it seems like in the cases where I'm getting these errors, not all reads have been processed for the non-canonical step. If I run snakemake --force results/kmers_count/individual_87/kmc_non-canonical.done
for the individuals that fail and then run the rest of the pipeline I get things to work.
To me this is worrying. Why is the non-canonical step marked as complete when not all reads are processed? Could this lead to hard-to-detect errors where enough - but not all - reads are processed?
EDIT: I just went back to my log files and verified that I have cases where not all reads are processed in the canonical step, but the pipeline still runs. You probably want to take a look at this @akcorut.
It turns out that KMC may not process all reads when the number of threads is limited. I've submitted an issue with KMC here https://github.com/refresh-bio/KMC/issues/235.
Hello there :) Thanks for using KMC, I responded in the created issue. In this specific case (I mean the issue posted on kmc repo) the cause is in ill-formed input fastq file (at least for one file its true, I have not checked remaining but I guess this is the same case). I would like to point that its not that KMC may not process all reads, its actually more like "undefined behaviour", so for example it is possible that not only reads are missing, but that some other parts of file are treaten as reads etc. I know it would be nice if KMC have better mechanism to detect correctnes of input files, and eventually exit with some error message, but its not trivial, and I don't expect adding this in the near future. Anyway if you guys have some other examples or questions I'm happy to assist, and thank you again for using KMC.
Best Marek
Thanks a ton for looking closer at this, @marekkokot ! I should have looked at the test data before submitting an issue with KMC. I just ran the pipeline using the SRA data for the E. coli example, and did not encounter this issue. I had some issues getting the data to download correctly from SRA however, maybe that was the reason for @brunacama93 getting this error with the E. coli dataset?
Hi @akcorut,
The errors occurred while I am testing this workflow by
kgwasflow test -t 5 --conda-frontend mamba
The error log from snakemake is
While I also tested the single code line
scripts/external/kmers_gwas/bin/kmers_add_strand_information -c results/kmers_count/individual_53/output_kmc_canon -n results/kmers_count/individual_53/output_kmc_all -k 25 -o results/kmers_count/individual_53/kmers_with_strand
it returnCould you give me some suggestions?