MattiaPandolfoVR / MetaPhage

GNU General Public License v3.0
36 stars 9 forks source link

kraken_files.R error #46

Open jflucier opened 2 years ago

jflucier commented 2 years ago

Hello,

I have sucessfully installed and run metaphage using the test dataset. When I run on my samples I get an error:

Error executing process > 'kraken_file (Creating the table...)'                                                                             

Caused by:                                                                                                                                  
  Process `kraken_file (Creating the table...)` terminated with an error exit status (1)                                                    

Command executed:                                                                                                                           

  Rscript /nfs3_ib/ip29-ib/ip29/fortierlc_group/programs/MetaPhage/bin/Rscript/kraken_files.R /nfs3_ib/ip29-ib/ip29/fortierlc_group/analysis
/2022_metaphage_reanalysis/MetaPhage-output/202001_mouse_virome3 metadata.csv                                                               

Command exit status:                                                                                                                        
  1                                                                                                                                         

Command output:                                                                                                                             
  Create DT version of the data.frame with kraken AND krona files                                                                           

Command error:                                                                                                                              
  source: /opt/software/conda_env/etc/conda/activate.d/activate-binutils_linux-64.sh:8:18: parameter expansion requires a literal           
  source: /opt/software/conda_env/etc/conda/activate.d/activate-gcc_linux-64.sh:8:18: parameter expansion requires a literal                
  source: /opt/software/conda_env/etc/conda/activate.d/activate-gfortran_linux-64.sh:8:18: parameter expansion requires a literal           
  source: /opt/software/conda_env/etc/conda/activate.d/activate-gxx_linux-64.sh:8:18: parameter expansion requires a literal                
  Failed to create bus connection: No such file or directory                                                                                
  Warning message:                                                                                                                          
  In system("timedatectl", intern = TRUE) :                                                                                                 
    running command 'timedatectl' had status 1
  Error: Problem with `filter()` input `..1`.
  i Input `..1` is `matching_kronas %in% matching_kronas`.
  x Input `..1` must be of size 8 or 1, not size 6.
  Backtrace:
       x
    1. +-dplyr::filter(df_krona_fp, matching_kronas %in% matching_kronas)
    2. +-dplyr:::filter.data.frame(df_krona_fp, matching_kronas %in% matching_kronas)
    3. | \-dplyr:::filter_rows(.data, ..., caller_env = caller_env())
    4. |   +-base::withCallingHandlers(...)
    5. |   \-mask$eval_all_filter(dots, env_filter)
    6. +-dplyr:::abort_glue(...)
    7. | +-rlang::exec(abort, message = message, class = class, !!!data)
    8. | \-(function (message = NULL, class = NULL, ..., trace = NULL, parent = NULL, ...
    9. |   \-rlang:::signal_abort(cnd)
   10. |     \-base::signalCondition(cnd)
   11. \-(function (e) ...
  Execution halted

Work dir:
  /nfs3_ib/ip29-ib/ip29/fortierlc_group/analysis/2022_metaphage_reanalysis/work/a3/bb376f28623fefb73ebabc60733740

Tip: when you have fixed the problem you can continue the execution adding the option `-resume` to the run command line

To further debug, I manually ran the R script and pinpoint the line that fails:

(MetaPhage) |16:24:56|jflucier@ip29:[2022_metaphage_reanalysis]> R                                                                          

R version 4.1.1 (2021-08-10) -- "Kick Things"                                                                                               
Copyright (C) 2021 The R Foundation for Statistical Computing                                                                               
Platform: x86_64-conda-linux-gnu (64-bit)                                                                                                   

R is free software and comes with ABSOLUTELY NO WARRANTY.                                                                                   
You are welcome to redistribute it under certain conditions.                                                                                
Type 'license()' or 'licence()' for distribution details.                                                                                   

  Natural language support but running in an English locale                                                                                 

R is a collaborative project with many contributors.                                                                                        
Type 'contributors()' for more information and                                                                                              
'citation()' on how to cite R or R packages in publications.                                                                                

Type 'demo()' for some demos, 'help()' for on-line help, or                                                                                 
'help.start()' for an HTML browser interface to help.                                                                                       
Type 'q()' to quit R.                                                                                                                       

> library(DT)                                                                                                                               
> library(readr)                                                                                                                            
> library(dplyr)                                                                                                                            

Attaching package: ‘dplyr’                                                                                                                  

The following objects are masked from ‘package:stats’:                                                                                      

    filter, lag                                                                                                                             

The following objects are masked from ‘package:base’:                                                                                       

    intersect, setdiff, setequal, union                                                                                                     

> library(plyr)                                                                                                                             
------------------------------------------------------------------------------                                                              
You have loaded plyr after dplyr - this is likely to cause problems.                                                                        
If you need functions from both plyr and dplyr, please load plyr first, then dplyr:                                                         
library(plyr); library(dplyr)                                                                                                               
------------------------------------------------------------------------------                                                              

Attaching package: ‘plyr’                                                                                                                   

The following objects are masked from ‘package:dplyr’:                                                                                      

    arrange, count, desc, failwith, id, mutate, rename, summarise,                                                                          
    summarize                                                                                                                               

> library(seqinr)                                                                                                                           

Attaching package: ‘seqinr’                                                                                                                 

The following object is masked from ‘package:plyr’:                                                                                         

    count                                                                                                                                   

The following object is masked from ‘package:dplyr’:                                                                                        

    count                                                                                                                                   

> library(tidyverse)                                                                                                                        
── Attaching packages ─────────────────────────────────────────────────────────────────────────────────────────────────── tidyverse 1.3.1 ──
✔ ggplot2 3.3.5     ✔ purrr   0.3.4                                                                                                         
✔ tibble  3.1.3     ✔ stringr 1.4.0                                                                                                         
✔ tidyr   1.1.4     ✔ forcats 0.5.1
── Conflicts ────────────────────────────────────────────────────────────────────────────────────────────────────── tidyverse_conflicts() ──
✖ plyr::arrange()   masks dplyr::arrange()
✖ purrr::compact()  masks plyr::compact()
✖ seqinr::count()   masks plyr::count(), dplyr::count()
✖ plyr::failwith()  masks dplyr::failwith()
✖ dplyr::filter()   masks stats::filter()
✖ plyr::id()        masks dplyr::id()
✖ dplyr::lag()      masks stats::lag()
✖ plyr::mutate()    masks dplyr::mutate()
✖ plyr::rename()    masks dplyr::rename()
✖ plyr::summarise() masks dplyr::summarise()
✖ plyr::summarize() masks dplyr::summarize()
> library(gtools)
> library(magrittr)

Attaching package: ‘magrittr’

The following object is masked from ‘package:purrr’:

    set_names

The following object is masked from ‘package:tidyr’:

    extract

> file_paths <- c("/nfs3_ib/ip29-ib/ip29/fortierlc_group/analysis/2022_metaphage_reanalysis/MetaPhage-output/202001_mouse_virome3")
> kraken_path <- file.path(file_paths, "taxonomy/kraken2")
> kraken_path
[1] "/nfs3_ib/ip29-ib/ip29/fortierlc_group/analysis/2022_metaphage_reanalysis/MetaPhage-output/202001_mouse_virome3/taxonomy/kraken2"
> if (! file.exists(kraken_path)){
  stop("ERROR: taxonomy/kraken2 folder not found in: ", kraken_path, "\n")
}
> krona_path <-  file.path(file_paths, "taxonomy/krona")
> if (! file.exists(krona_path)){
  cat("WARNING: taxonomy/krona folder not found in: ", krona_path, "\n")
}
> file_meta <- c("/nfs3_ib/ip29-ib/ip29/fortierlc_group/analysis/2022_metaphage_reanalysis/work/a3/bb376f28623fefb73ebabc60733740/metadata.c
sv")
> if (! file.exists(file_meta)){
  stop("ERROR: metadata file not found in: ", file_meta, "\n")
}
> metadata <- read.delim(file_meta, row.names = 1, sep = ",", check.names = F)
metadata$Sample <- rownames(metadata)
metadata <- metadata %>%
  select(Sample, everything())
> metadata
      Sample               Projet DaysPostInd   Group MouseID Specie
VMC11  VMC11 202001_mouse_virome3          D0  EXP-D0  740-D0  Mouse
VMC16  VMC16 202001_mouse_virome3          D0  EXP-D0  792-D0  Mouse
VMC18  VMC18 202001_mouse_virome3          D0  EXP-D0  794-D0  Mouse
VMC19  VMC19 202001_mouse_virome3         D21 EXP-D21 740-D21  Mouse
VMC24  VMC24 202001_mouse_virome3         D21 EXP-D21 792-D21  Mouse
VMC26  VMC26 202001_mouse_virome3         D21 EXP-D21 794-D21  Mouse
> krakens_path <- list.files(path = kraken_path, 
                           pattern = "_report.txt", 
                           recursive = TRUE,
                           full.names=T)
> df_kraken_fp <- as.data.frame(krakens_path)
> krakens_names = unique(basename(krakens_path)) 
krakens_names <- sapply(strsplit(krakens_names,"_report.txt"), `[`, 1) 
matching_krakens = krakens_names[krakens_names %in% metadata$Sample]
> matching_krakens
[1] "VMC11" "VMC16" "VMC18" "VMC19" "VMC24" "VMC26"
> krakens_names_fp = as.data.frame(
  grep(paste(matching_krakens,collapse="|"), df_kraken_fp$krakens_path, value=TRUE)
)
> colnames(krakens_names_fp) <- c("krakens_path")
> krakens_fp = paste0("../", str_extract(krakens_names_fp$krakens_path,
                                       "taxonomy/kraken2/.+/.+_report.txt"))
> df_krakens = tibble("Sample" = matching_krakens, file = krakens_fp) %>%
  mutate(file = str_replace_all(file,
                                '([^;]*)taxonomy/kraken2/.+/', ''),
         path_krakens = file.path(krakens_fp),
         Kraken_Report = paste0('<a target=_blank href=',
                                path_krakens, '>', file,'</a>'))
df_krakens$file <- NULL
df_krakens$path_krakens <- NULL
> if(file.exists(krona_path)){
  cat("Create DT version of the data.frame with kraken AND krona files\n")
  # read the krona files and create a data.frame of paths
  kronas_path <- list.files(path = krona_path, 
                            ".+_krak_krona_abundancies.html", 
                            recursive = TRUE,
                            full.names=T)
+ 
+ 
> message("allo")
allo
> if(file.exists(krona_path)){
message("krona")
+ } else {
+ message("no krona")
+ }
krona                                                                                  
> kronas_path <- list.files(path = krona_path,                                                                                              
                            ".+_krak_krona_abundancies.html",                                                                               
                            recursive = TRUE,                                                                                               
                            full.names=T)                                                                                                   
> kronas_path                                                                                                                               
[1] "/nfs3_ib/ip29-ib/ip29/fortierlc_group/analysis/2022_metaphage_reanalysis/MetaPhage-output/202001_mouse_virome3/taxonomy/krona/VMC11/VMC11_krak_krona_abundancies.html"                                                                                                             
[2] "/nfs3_ib/ip29-ib/ip29/fortierlc_group/analysis/2022_metaphage_reanalysis/MetaPhage-output/202001_mouse_virome3/taxonomy/krona/VMC13/VMC13_krak_krona_abundancies.html"                                                                                                             
[3] "/nfs3_ib/ip29-ib/ip29/fortierlc_group/analysis/2022_metaphage_reanalysis/MetaPhage-output/202001_mouse_virome3/taxonomy/krona/VMC16/VMC16_krak_krona_abundancies.html"                                                                                                             
[4] "/nfs3_ib/ip29-ib/ip29/fortierlc_group/analysis/2022_metaphage_reanalysis/MetaPhage-output/202001_mouse_virome3/taxonomy/krona/VMC18/VMC18_krak_krona_abundancies.html"                                                                                                             
[5] "/nfs3_ib/ip29-ib/ip29/fortierlc_group/analysis/2022_metaphage_reanalysis/MetaPhage-output/202001_mouse_virome3/taxonomy/krona/VMC19/VMC19_krak_krona_abundancies.html"                                                                                                             
[6] "/nfs3_ib/ip29-ib/ip29/fortierlc_group/analysis/2022_metaphage_reanalysis/MetaPhage-output/202001_mouse_virome3/taxonomy/krona/VMC21/VMC21_krak_krona_abundancies.html"                                                                                                             
[7] "/nfs3_ib/ip29-ib/ip29/fortierlc_group/analysis/2022_metaphage_reanalysis/MetaPhage-output/202001_mouse_virome3/taxonomy/krona/VMC24/VMC
24_krak_krona_abundancies.html"                                                                                                             
[8] "/nfs3_ib/ip29-ib/ip29/fortierlc_group/analysis/2022_metaphage_reanalysis/MetaPhage-output/202001_mouse_virome3/taxonomy/krona/VMC26/VMC
26_krak_krona_abundancies.html"                                                                                                             
> df_krona_fp <- as.data.frame(kronas_path)                                                                                                 
> kronas_path                                                                                                                               
[1] "/nfs3_ib/ip29-ib/ip29/fortierlc_group/analysis/2022_metaphage_reanalysis/MetaPhage-output/202001_mouse_virome3/taxonomy/krona/VMC11/VMC
11_krak_krona_abundancies.html"                                                                                                             
[2] "/nfs3_ib/ip29-ib/ip29/fortierlc_group/analysis/2022_metaphage_reanalysis/MetaPhage-output/202001_mouse_virome3/taxonomy/krona/VMC13/VMC
13_krak_krona_abundancies.html"                                                                                                             
[3] "/nfs3_ib/ip29-ib/ip29/fortierlc_group/analysis/2022_metaphage_reanalysis/MetaPhage-output/202001_mouse_virome3/taxonomy/krona/VMC16/VMC
16_krak_krona_abundancies.html"                                                                                                             
[4] "/nfs3_ib/ip29-ib/ip29/fortierlc_group/analysis/2022_metaphage_reanalysis/MetaPhage-output/202001_mouse_virome3/taxonomy/krona/VMC18/VMC
18_krak_krona_abundancies.html"                                                                                                             
[5] "/nfs3_ib/ip29-ib/ip29/fortierlc_group/analysis/2022_metaphage_reanalysis/MetaPhage-output/202001_mouse_virome3/taxonomy/krona/VMC19/VMC
19_krak_krona_abundancies.html"                                                                                                             
[6] "/nfs3_ib/ip29-ib/ip29/fortierlc_group/analysis/2022_metaphage_reanalysis/MetaPhage-output/202001_mouse_virome3/taxonomy/krona/VMC21/VMC
21_krak_krona_abundancies.html"                                                                                                             
[7] "/nfs3_ib/ip29-ib/ip29/fortierlc_group/analysis/2022_metaphage_reanalysis/MetaPhage-output/202001_mouse_virome3/taxonomy/krona/VMC24/VMC
24_krak_krona_abundancies.html"                                                                                                             
[8] "/nfs3_ib/ip29-ib/ip29/fortierlc_group/analysis/2022_metaphage_reanalysis/MetaPhage-output/202001_mouse_virome3/taxonomy/krona/VMC26/VMC
26_krak_krona_abundancies.html"                                                                                                             
> kronas_names = unique(basename(kronas_path))                                                                                              
  kronas_names                                                                                                                              
  kronas_names <- sapply(strsplit(kronas_names,"_krak_krona_abundancies.html"), `[`, 1)                                                     
  matching_kronas = kronas_names[kronas_names %in% metadata$Sample]                                                                         

[1] "VMC11_krak_krona_abundancies.html" "VMC13_krak_krona_abundancies.html"                                                                 
[3] "VMC16_krak_krona_abundancies.html" "VMC18_krak_krona_abundancies.html"                                                                 
[5] "VMC19_krak_krona_abundancies.html" "VMC21_krak_krona_abundancies.html"                                                                 
[7] "VMC24_krak_krona_abundancies.html" "VMC26_krak_krona_abundancies.html"                                                                 
> kronas_names_fp <- filter(df_krona_fp,                                                                                                    
                            matching_kronas %in% matching_kronas)                                                                           

Error: Problem with `filter()` input `..1`.                                                                                                 
ℹ Input `..1` is `matching_kronas %in% matching_kronas`.                                                                                    
✖ Input `..1` must be of size 8 or 1, not size 6.                                                                                           
Run `rlang::last_error()` to see where the error occurred.                                                                                  
> matching_kronas                                                                                                                           
[1] "VMC11" "VMC16" "VMC18" "VMC19" "VMC24" "VMC26"                                                                                         
> df_krona_fp
                                                                                                                                                            kronas_path
1 /nfs3_ib/ip29-ib/ip29/fortierlc_group/analysis/2022_metaphage_reanalysis/MetaPhage-output/202001_mouse_virome3/taxonomy/krona/VMC11/VMC11_krak_krona_abundancies.html
2 /nfs3_ib/ip29-ib/ip29/fortierlc_group/analysis/2022_metaphage_reanalysis/MetaPhage-output/202001_mouse_virome3/taxonomy/krona/VMC13/VMC13_krak_krona_abundancies.html
3 /nfs3_ib/ip29-ib/ip29/fortierlc_group/analysis/2022_metaphage_reanalysis/MetaPhage-output/202001_mouse_virome3/taxonomy/krona/VMC16/VMC16_krak_krona_abundancies.html
4 /nfs3_ib/ip29-ib/ip29/fortierlc_group/analysis/2022_metaphage_reanalysis/MetaPhage-output/202001_mouse_virome3/taxonomy/krona/VMC18/VMC18_krak_krona_abundancies.html
5 /nfs3_ib/ip29-ib/ip29/fortierlc_group/analysis/2022_metaphage_reanalysis/MetaPhage-output/202001_mouse_virome3/taxonomy/krona/VMC19/VMC19_krak_krona_abundancies.html
6 /nfs3_ib/ip29-ib/ip29/fortierlc_group/analysis/2022_metaphage_reanalysis/MetaPhage-output/202001_mouse_virome3/taxonomy/krona/VMC21/VMC21_krak_krona_abundancies.html
7 /nfs3_ib/ip29-ib/ip29/fortierlc_group/analysis/2022_metaphage_reanalysis/MetaPhage-output/202001_mouse_virome3/taxonomy/krona/VMC24/VMC24_krak_krona_abundancies.html
8 /nfs3_ib/ip29-ib/ip29/fortierlc_group/analysis/2022_metaphage_reanalysis/MetaPhage-output/202001_mouse_virome3/taxonomy/krona/VMC26/VMC26_krak_krona_abundancies.html

My metadata file is the following:

Sample,Projet,DaysPostInd,Group,MouseID,Specie
VMC11,202001_mouse_virome3,D0,EXP-D0,740-D0,Mouse
VMC16,202001_mouse_virome3,D0,EXP-D0,792-D0,Mouse
VMC18,202001_mouse_virome3,D0,EXP-D0,794-D0,Mouse
VMC19,202001_mouse_virome3,D21,EXP-D21,740-D21,Mouse
VMC24,202001_mouse_virome3,D21,EXP-D21,792-D21,Mouse
VMC26,202001_mouse_virome3,D21,EXP-D21,794-D21,Mouse

Thank for your help in advance JF

telatin commented 2 years ago

Thanks for sharing a detailed walk through your problem. Just a couple of questions: did you test the main branch or the "development" version, i.e. which of the two tutorials: https://mattiapandolfovr.github.io/MetaPhage/tutorial or https://mattiapandolfovr.github.io/MetaPhage/tutorial-v2

If you used the "stable" version, did you supply the dependencies via conda, docker or singularity? Cheers Andrea

jflucier commented 2 years ago

Hello,

tks for quick response

Just a couple of questions: did you test the main branch or the "development" version, i.e. which of the two tutorials: https://mattiapandolfovr.github.io/MetaPhage/tutorial or https://mattiapandolfovr.github.io/MetaPhage/tutorial-v2

I followed the stable tutorial.

If you used the "stable" version, did you supply the dependencies via conda, docker or singularity?

I installed on singularity and run on an HPC system with slurm

I think I figured out the problem. My test currently running is farther in execution. The kraken_file (Creating the table...) step passed successfully.

To get this working, I removed the metadata directory in the fastq filepath folder and also removed the work folder.

My guess is that if you change your metadata file and restart the pipeline from scratch, I think this generates the error.

My suggestion to others facing the same problem: if you need to modify your metadata file and restart pipeline without the resume, you also need to delete all files generated from previous execution (metadata folder & work folder) and restart from scratch.

JF

telatin commented 2 years ago

Thanks a million for the feedback, leaving open for a moment to update the docs and fix for a future release