functional-dark-side / agnostos-wf

43 stars 15 forks source link

Spurious Shadow Step Question #21

Open wchow opened 2 years ago

wchow commented 2 years ago

Hi,

I'm trying to test agnostos update module (db_update) using a small test set of sequences (50 or so). I've been running the snakemake pipeline manually going through each smk file, starting with gene_prediction.smk to mmseqs_clustering_results.smk. I am now at spurious_shadows.

When I run this it errors out, and I'm not sure if its b/c there is not results or if its something with the code. So I decided to run each command of the shell portion from spurious_shadows.smk as ouputed in the snakemake log files.

When I run the hmmer search on line 61 of

       {params.mpi_runner} {params.hmmer_bin} --mpi --cut_ga -Z "${{N}}" --domtblout {params.hmmout} -o {params.hmmlog} {params.antifamdb} {input.fasta} 2>{log.err} 1>{log.out}

followed by line 69

        grep -v '^#' {params.hmmout} > {params.spur}.tmp || true > {params.spur}.tmp 2>>{log.err}

The resulting tmp file is empty, and looking at the hmmout file it also has no hits. With the tmp file being empty I can not proceed to the next steps # 2. Detection of shadow ORFs line 80.

So my question is, does it just die there and I can not proceed further pass spurious_shadow and to step cluster_pfam_annotation?

thanks!

genomewalker commented 2 years ago

Hi @wchow we are on vacation at the moment and we will back to you once we are back.

Apologies for the inconvenience

Antonio

ChiaraVanni commented 2 years ago

Hi @wchow !

The fact that you don't have any hits from the search against the Antifam database should not stop the script. The file {params.spur}.tmp can be empty. Could you please share the logs? Thanks for trying AGNOSTOS!

Chiara

wchow commented 2 years ago

Thanks @ChiaraVanni , and hope you are having a good week.

when you mean the logs, do you mean db_update/spurious_shadow/hmmsearch_antifam_sp.log or the snakemake logs?

for the former its a long file and quite repetitive of the same result so I'll show the first few lines (and last) in the file.

# Copyright (C) 2019 Howard Hughes Medical Institute.
# Freely distributed under the BSD open source license.
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# query HMM file:                  /agnostos_db/AntiFam.hmm
# target sequence database:        /agnostos_wd/db_update/gene_prediction/orf_seqs.fasta
# output directed to file:         /agnostos_wd/db_update/spurious_shadow/hmmsearch_antifam_sp.log
# per-dom hits tabular output:     /agnostos_wd/db_update/spurious_shadow/hmmsearch_antifam_sp.out
# model-specific thresholding:     GA cutoffs
# sequence search space set to:    13150
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

Query:       Spurious_ORF_01  [M=50]
Accession:   ANF00001
Description: Shadow ORF
Scores for complete sequences (score includes all domains):
   --- full sequence ---   --- best 1 domain ---    -#dom-
    E-value  score  bias    E-value  score  bias    exp  N  Sequence Description
    ------- ------ -----    ------- ------ -----   ---- --  -------- -----------

   [No hits detected that satisfy reporting thresholds]

Domain annotation for each sequence (and alignments):

   [No targets detected that satisfy reporting thresholds]

Internal pipeline statistics summary:
-------------------------------------
Query model(s):                            1  (50 nodes)
Target sequences:                         50  (7809 residues searched)
Passed MSV filter:                         0  (0); expected 1.0 (0.02)
Passed bias filter:                        0  (0); expected 1.0 (0.02)
Passed Vit filter:                         0  (0); expected 0.1 (0.001)
Passed Fwd filter:                         0  (0); expected 0.0 (1e-05)
Initial search space (Z):              13150  [as set by --Z on cmdline]
Domain search space  (domZ):               0  [number of targets reported over threshold]
# CPU time: 0.00u 0.00s 00:00:00.00 Elapsed: 00:00:00.00
# Mc/sec: 527.49
//

....
<keeps going with other queries returning "[No hits detected that satisfy reporting thresholds]">
....

//
Query:       Spurious_ORF_265  [M=18]
Accession:   ANF00265
Description: Borrelia repeat
Scores for complete sequences (score includes all domains):
   --- full sequence ---   --- best 1 domain ---    -#dom-
    E-value  score  bias    E-value  score  bias    exp  N  Sequence Description
    ------- ------ -----    ------- ------ -----   ---- --  -------- -----------

   [No hits detected that satisfy reporting thresholds]

Domain annotation for each sequence (and alignments):

   [No targets detected that satisfy reporting thresholds]

Internal pipeline statistics summary:
-------------------------------------
Query model(s):                            1  (18 nodes)
Target sequences:                         50  (7809 residues searched)
Passed MSV filter:                         2  (0.04); expected 1.0 (0.02)
Passed bias filter:                        1  (0.02); expected 1.0 (0.02)
Passed Vit filter:                         0  (0); expected 0.1 (0.001)
Passed Fwd filter:                         0  (0); expected 0.0 (1e-05)
Initial search space (Z):              13150  [as set by --Z on cmdline]
Domain search space  (domZ):               0  [number of targets reported over threshold]
# CPU time: 0.00u 0.00s 00:00:00.00 Elapsed: 00:00:00.00
# Mc/sec: 480.21
//

The other two files are also empty in this directory:

However if you are asking for the Snakemake logs:

Error in rule spurious_shadow:
    jobid: 2
    output: /agnostos_wd/db_update/spurious_shadow/spurious_shadow_info.tsv
    log: logs/spsh_stdout.log, logs/spsh_stderr.err (check log file(s) for error message)
    conda-env: /agnostos-wf/db_update/.snakemake/conda/f02ee8752c7a566bf7433e28f964287d
    shell:

        set -x
        set -e

        ....<code>....

       (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)
    cluster_jobid: Submitted batch job 2

Error executing rule spurious_shadow on cluster (jobid: 2, external: Submitted batch job 2, jobscript: /agnostos-wf/db_update/.snakemake/tmp.qmed2mdl/snakejob.spurious_shadow.2.sh). For error details see the cluster log and the log files of the involved rule(s).
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message

I then tried to run each line of the code individually (hmmer line 61-69) which yielded empty files hence was wondering if the script can continue on because of that.

Thanks again for your help @ChiaraVanni ! Much appreciated

Will

ChiaraVanni commented 2 years ago

Hi will, if you could share the logs/spsh_stdout.log, logs/spsh_stderr.err and the full snakemake log would be great! Thanks Chiara

wchow commented 2 years ago

Ah, I didn't think to look into these logs. It appears there was some dependencies missing. Here is the stderr log, the stdout is empty.

(agnostos) root@12345:/agnostos-wf/db_update# less logs/spsh_stderr.err
Warning: dependency 'rtracklayer' is not available
trying URL 'http://cran.us.r-project.org/src/contrib/valr_0.6.4.tar.gz'
Content type 'application/x-gzip' length 679308 bytes (663 KB)
==================================================
downloaded 663 KB

ERROR: dependency 'rtracklayer' is not available for package 'valr'
* removing '/root/miniconda3/envs/agnostos/lib/R/library/valr'

The downloaded source packages are in
        '/tmp/Rtmpgz0xrj/downloaded_packages'
Warning message:
In install.packages("valr", repos = "http://cran.us.r-project.org") :
  installation of package 'valr' had non-zero exit status
-- Attaching packages ---------------------------------------------------------------------------------------------------------------------------------------------------- tidyverse 1.3.1 --
v ggplot2 3.3.3     v purrr   0.3.4
v tibble  3.1.2     v dplyr   1.0.6
v tidyr   1.1.3     v stringr 1.4.0
v readr   1.4.0     v forcats 0.5.1
-- Conflicts ------------------------------------------------------------------------------------------------------------------------------------------------------- tidyverse_conflicts() --
x dplyr::filter() masks stats::filter()
x dplyr::lag()    masks stats::lag()

Attaching package: 'data.table'

The following objects are masked from 'package:dplyr':

    between, first, last

The following object is masked from 'package:purrr':

    transpose

Error in library(valr) : there is no package called 'valr'
Execution halted

In regards to the full snakemake log, I've been running each snakemake rule separately, so the last log I have is the one I have from previous comment (with the code omitted). If you still would like this I can send that over. However do you think its this dependency that could be the cause? I'll try installing the dependencies and see if it works and get back to you.

thanks!