Closed matinnuhamunada closed 2 years ago
Found the issue here: https://github.com/tseemann/prokka/issues/360
I think you need to install rnammer yourself and add it to your PATH. This is probably because, although rnammer is free, the developers want you to agree with their license term before allowing you to use it. Therefore it cannot simply be bundled with prokka.
You can get rnammer here
But beware, there are a few nuisances when installing (rnammer has not been updated in a while and some of the dependencies changed). You have to make sure to have hmmer2 installed and the path it's binaries set in the rnammer script. It will NOT work with hmmer3 (it will just yield empty results with the new versions).
Also you likely have to delete all instances of "--cpu=1" from the core-rnammer script. Details on these problems and on how to solve them are given here (rnammer is still my favorite rRNA-gene finder though. Really hope they'll release an updated version of it sometime)
Prokka will use barnapp
in 918ef2f
As RNAmmer have academic license, there is no way to fully automate the installation in Snakemake.
Instead, I'll add it to the resources
, and create a symlink to the prokka environment bin
Path.
Write myself a note to make RNAmmer work here: https://matinnuhamunada.github.io/posts/2022/01/rnammer
As RNAmmer have academic license, there is no way to fully automate the installation in Snakemake. Instead, I'll add it to the
resources
, and create a symlink to the prokka environmentbin
Path.Write myself a note to make RNAmmer work here: https://matinnuhamunada.github.io/posts/2022/01/rnammer
To achieve this, first add a parameter for rnammer
in the config.yaml
:
#### RULE CONFIGURATION ####
# rules: set value to TRUE if you want to run the analysis or FALSE if you don't
rules:
bigscape: TRUE
mlst: TRUE
refseq_masher: TRUE
seqfu: TRUE
eggnog: FALSE
rnammer: TRUE
To make sure the DAG works, set an output of rule rnammer_setup
that will be generated in the common.smk
:
##### Customizable Analysis #####
def get_final_output():
"""
Generate final output for rule all given a TRUE value in config["rules"]
"""
# dictionary of rules and its output files
rule_dict = {"mlst" : expand("data/interim/mlst/{strains}_ST.csv", strains = STRAINS),
"eggnog" : expand("data/interim/eggnog/{strains}/", strains = STRAINS),
"refseq_masher" : expand("data/interim/refseq_masher/{strains}_masher.csv", strains = STRAINS),
"automlst_wrapper" : "data/interim/automlst_wrapper/raxmlpart.txt.treefile",
"roary" : expand("data/interim/roary/{name}", name=PROJECT_IDS),
"bigscape" : expand("data/interim/bigscape/{name}_antismash_{version}/index.html", version=dependency_version["antismash"], name=PROJECT_IDS),
"seqfu" : "data/processed/tables/df_seqfu_stats.csv",
"rnammer": "resources/rnammer_test.txt"
}
# get keys from config
opt_rules = config["rules"].keys()
# if values are TRUE add output files to rule all
final_output = [rule_dict[r] for r in opt_rules if config["rules"][r]]
return final_output
The path to the pre-installed RNAmmer should be added in the config.yaml
:
#### RESOURCES CONFIGURATION ####
# resources : the location of the resources to run the rule. The default location is at "resources/{resource_name}".
resources_path:
antismash_db: /data/a/matinnu/data/bgcflow_resources/antismash_db
eggnog_db: /data/a/matinnu/data/bgcflow_resources/eggnog_db
BiG-SCAPE: /data/a/matinnu/data/bgcflow_resources/BiG-SCAPE
RNAmmer: /data/a/matinnu/data/bgcflow_resources/rnammer-1.2
Then, make a conditional in prokka.smk
:
if config["rules"]["rnammer"] == True:
prokka_params_rna = "--rnammer"
rule rnammer_setup:
output:
"resources/rnammer_test.txt"
priority: 50
conda:
"../envs/prokka.yaml"
log: "workflow/report/logs/rnammer_setup.log"
shell:
"""
ln -s $PWD/resources/RNAmmer/rnammer $CONDA_PREFIX/bin/rnammer 2>> {log}
rnammer -S bac -m lsu,ssu,tsu -gff - example/ecoli.fsa >> {output}
"""
else:
prokka_params_rna = ""
pass
The priority was set to make sure it runs first. The conditional also generated a variable for prokka
parameters: --rnammer
or an empty string, which then passed to rule prokka
params
- rna_detection
:
rule prokka:
input:
fna = "data/interim/fasta/{strains}.fna",
org_info = "data/interim/prokka/{strains}/organism_info.txt",
refgbff = expand("resources/prokka_db/reference_{name}.gbff", name=PROJECT_IDS)
output:
gff = "data/interim/prokka/{strains}/{strains}.gff",
faa = "data/interim/prokka/{strains}/{strains}.faa",
gbk = "data/interim/prokka/{strains}/{strains}.gbk",
conda:
"../envs/prokka.yaml"
log: "workflow/report/logs/{strains}/prokka_run.log"
params:
increment = 10,
evalue = "1e-05",
rna_detection = prokka_params_rna,
refgbff = lambda wildcards: get_prokka_refdb(wildcards, DF_SAMPLES)
threads: 8
shell:
"""
prokka --outdir data/interim/prokka/{wildcards.strains} --force {params.refgbff} --prefix {wildcards.strains} --genus "`cut -d "," -f 1 {input.org_info}`" --species "`cut -d "," -f 2 {input.org_info}`" --strain "`cut -d "," -f 3 {input.org_info}`" --cdsrnaolap --cpus {threads} {params.rna_detection} --increment {params.increment} --evalue {params.evalue} {input.fna}
cat data/interim/prokka/{wildcards.strains}/{wildcards.strains}.log > {log}
"""
Little fix on rnammer:
Can't open example/ecoli.fsa: No such file or directory at /home/matinnu/bgcflow/.snakemake/conda/53988b3ff79af022ea1ba61b2461b84f/bin/rnammer line 104.
Error in rnammer setup
Here's the contents of error log file for rnammer_setup.log
ln: failed to create symbolic link '/home/bgcflow/anaconda3/envs/snakemake/bin/rnammer': File exists
Error in rnammer setup
Here's the contents of error log file for rnammer_setup.log
ln: failed to create symbolic link '/home/bgcflow/anaconda3/envs/snakemake/bin/rnammer': File exists
Please ignore above error. I did not read the documentation properly to understand that this feature needs to be manually installed and has licensing issues. As described here
Write myself a note to make RNAmmer work here: https://matinnuhamunada.github.io/posts/2022/01/rnammer
Hi @OmkarSaMo, take a look at this two runs:
Rnammer called:
RNAmmer missing, Barrnap takeover:
Should we just use barrnap?
These are the two differences params used: latest with rnammer installed
--force --proteins resources/prokka_db/reference.gbff --prefix P8-2B-3.1 --genus Streptomyces --species sp. --strain P8-2B-3 --cdsrnaolap --cpus 8 --rnammer --increment 10 --evalue 1e-05 data/interim/fasta/P8-2B-3.1.fna
old run missing rnammer dependencies
--proteins ../resources/Actinos_6species.gbff --prefix P8-2B-3 --genus Streptomyces --species sp._NRRL_B-3253_0.9921 --strain P8-2B-3 --cdsrnaolap --cpus 12 --rnammer --increment 10 --evalue 1e-05 /data/matinnu/zhiyan/genomes/P8-2B-3/prokkainput.fna