Closed changsarahl closed 9 months ago
Hello!
I was wondering whether there was a way to ensure that all of the rules ran in sequence for species pairs?
I'm not completely sure what you mean, but when you run snakemake and it says "100% complete" at the end of the run, or you try to run the same snakemake script again in the same directory it will say, "Nothing to be done". This means that all of the jobs completed successfully.
Currently, I am running this by specifying specific rules to be run one by one in snakemake, but please let me know if there is another way.
I am not sure what you mean, sorry! The command you posted, snakemake -r -p --cores 60 --snakefile /home/krablab/odp/scripts/odp
, will perform all of the comparisons of the species in the yaml file.
The job that failed somewhere and didn't complete
.
.
.
.
Finished job 3623.
2774 of 7817 steps (35%) done
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
Complete log: .snakemake/log/2023-07-13T224127.325685.snakemake.log
To figure out what went wrong you'll have to re-run snakemake, then paste the errors here. It looks like the log file that you pasted only contains the stdout and not the stderr, so I can't see what the errors are. Try again and maybe I can pinpoint what went wrong?
Hi, thank you for the quick reply, this seems to be the error: filtered_D_FET_rbh is the last thing that is struggling to run.
snakemake -r -p --cores 40 --snakefile /mnt/krab1/SLC_data/odp/scripts/odp Building DAG of jobs... Using shell: /usr/bin/bash Provided cores: 40 Rules claiming more threads will be scaled down. Job stats: job count min threads max threads
all 1 1 1 filtered_D_FET_rbh 741 1 1 total 742 1 1
Select jobs to execute...
[Mon Jul 24 10:31:44 2023] rule filtered_D_FET_rbh: input: odp/step1-rbh/Ccha_Mamb_reciprocal_best_hits.D.FET.rbh output: odp/step1-rbh-filtered/Ccha_Mamb_reciprocal_best_hits.D.FET.filt.rbh jobid: 3853 reason: Missing output files: odp/step1-rbh-filtered/Ccha_Mamb_reciprocal_best_hits.D.FET.filt.rbh wildcards: analysis=Ccha_Mamb resources: tmpdir=/tmp
RuleException: TypeError in file /mnt/krab1/SLC_data/odp/scripts/odp, line 1839: Must provide 'func' or tuples of '(column, aggfunc). File "/mnt/krab1/SLC_data/odp/scripts/odp", line 1839, in __rule_filtered_D_FET_rbh File "/home/krablab/.local/lib/python3.8/site-packages/pandas/core/groupby/generic.py", line 865, in aggregate File "/home/krablab/.local/lib/python3.8/site-packages/pandas/core/apply.py", line 1260, in reconstruct_func File "/home/krablab/Documents/apps/smcpp/envs/odp/lib/python3.8/concurrent/futures/thread.py", line 57, in run Shutting down, this might take some time. Exiting because a job execution failed. Look above for error message
Also, is the output of filtered_D_FET_rbh required to create the ribbon plots? I am unsure whether the directory needed to be specified in the ribbon_plot config.yaml file is to the step1-rbh directory or the step2-figures/synteny-nocolor directory.
Thanks!
I'm having a similar issue, also in rule filtered_D_FET_rbh. This seems to be caused by the missing parameter of the .agg() method in 1839 of the odp code, and I don't know how to modify it. Below is my bug report, config file.
err:
Building DAG of jobs...
Using shell: /usr/bin/bash
Provided cores: 20
Rules claiming more threads will be scaled down.
Job stats:
job count min threads max threads
------------------ ------- ------------- -------------
all 1 1 1
filtered_D_FET_rbh 1 1 1
total 2 1 1
Select jobs to execute...
[Thu Jul 27 10:45:54 2023]
rule filtered_D_FET_rbh:
input: odp/step1-rbh/Myes_Pmax_reciprocal_best_hits.D.FET.rbh
output: odp/step1-rbh-filtered/Myes_Pmax_reciprocal_best_hits.D.FET.filt.rbh
jobid: 11
reason: Missing output files: odp/step1-rbh-filtered/Myes_Pmax_reciprocal_best_hits.D.FET.filt.rbh
wildcards: analysis=Myes_Pmax
resources: tmpdir=/tmp
Building DAG of jobs...
Using shell: /usr/bin/bash
Provided cores: 20
Rules claiming more threads will be scaled down.
Select jobs to execute...
[Thu Jul 27 10:45:55 2023]
Error in rule filtered_D_FET_rbh:
jobid: 0
input: odp/step1-rbh/Myes_Pmax_reciprocal_best_hits.D.FET.rbh
output: odp/step1-rbh-filtered/Myes_Pmax_reciprocal_best_hits.D.FET.filt.rbh
RuleException:
TypeError in file /home/bio_soft/odp/scripts/odp, line 1839:
Must provide 'func' or tuples of '(column, aggfunc).
File "/home/bio_soft/odp/scripts/odp", line 1839, in __rule_filtered_D_FET_rbh
File "/home/miniconda3/lib/python3.10/site-packages/pandas/core/groupby/generic.py", line 1265, in aggregate
File "/home/miniconda3/lib/python3.10/site-packages/pandas/core/apply.py", line 1198, in reconstruct_func
File "/home/miniconda3/lib/python3.10/concurrent/futures/thread.py", line 58, in run
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
Complete log: .snakemake/log/2023-07-27T104553.171315.snakemake.log
config.yaml:
#
# This is an example config file for odp/scripts/odp
#
# # To use this software first copy this config file to your analysis directory
# cp odp/example_configs/CONFIG_odp.yaml ./config.yaml
# # Then modify the config file to include your own data
# vim config.yaml
# # Then run the pipeline
# snakemake -r -p --snakefile odp/scripts/odp
ignore_autobreaks: True # Skip steps to find breaks in synteny blocks
diamond_or_blastp: "diamond" # "diamond" or "blastp"
duplicate_proteins: "pass" # currently only "fail" or "best". Fail doesn't allow duplicate names or seqs
plot_LGs: True # Plot the ALGs based on the installed databases
plot_sp_sp: True # Plot the synteny between two species, if False just generates .rbh files
species:
Pmax:
proteins: Pmax.lfaa
chrom: Pmax.chrom
genome: Pmax.fna
minscafsize: 3000000 # Only plots scaffolds that are 1 Mbp or longer
Myes:
proteins: Myes.lfaa
chrom: Myes.chrom
genome: Myes.fna
minscafsize: 3000000 # Only plots scaffolds that are 1 Mbp or longer
Thanks for any help!
Should be working with the update https://github.com/conchoecia/odp/commit/ba6c45375c2bf5e75a9ff0779e5d75b5509209db
Please run git pull
from within the odp
directory to update, try again, and reopen this issue if you have the same problem.
Hi, thank you for this software. I was wondering whether there was a way to ensure that all of the rules ran in sequence for species pairs?
I am attempting to run this with a large number of species (approx. 40), and I seem to be running into errors when certain input files are missing. The input is being run over 60 cores, and the snakemake error file is attached.
Currently, I am running this by specifying specific rules to be run one by one in snakemake, but please let me know if there is another way.
input code:
snakemake -r -p --cores 60 --snakefile /home/krablab/odp/scripts/odp
2023-07-13T224127.325685.snakemake.txt