Sydney-Informatics-Hub / ONT-bacpac-nf

Bacterial profiling workflow for ONT data, written in Nextflow.
GNU General Public License v3.0
1 stars 0 forks source link

trycycler cluster issue workover, avoid building the tree; don't call the function #41

Closed nandan75 closed 3 months ago

nandan75 commented 3 months ago

What is the segmentation fault issue

The first thing which I tried is the above options Local installation of trycycler from source on NCI-Gadi to check if updated dependencies (e.g. R package - ape as suggested in the Trycycler repo issue) can solve this problem

Trycycler installation using source

  1. git clone https://github.com/rrwick/Trycycler.git

Note that the above command installed Trycycler itself and the Python packages it needs (Edlib, NumPy and SciPy). It did not install the external tools that Trycycler requires. For those, please look at the Software requirements page.

Dependencies - Software requirements

Minisasm/minimap2 : https://github.com/lh3/miniasm • git clone https://github.com/lh3/minimap2 && (cd minimap2 && make)
o Installation worked o Executable: $ PATH/minimap2/minimap2

• git clone https://github.com/lh3/miniasm && (cd miniasm && make)
o Installation worked o Executable: $PATH/miniasm/miniasm

Mash: https://mash.readthedocs.io/en/latest/

MUSCLE: https://drive5.com/muscle/downloads_v3.htm
o wget https://drive5.com/muscle/downloads3.8.31/muscle3.8.31_i86linux64.tar.gz o tar -zxvf muscle3.8.31_i86linux64.tar.gz Installation worked Executable: $PATH/muscle3.8.31_i86linux64

R with phylogenetics packages (This part was accomplished with Nathanial’s assistance Thanks @natbutter :-))

module load R/4.3.1 intel-compiler/2021.2.0
Rscript -e 'install.packages("ape", repos=https://cloud.r-project.org/, lib="/scratch/er01/npd561")'
library(ape,lib="/scratch/er01/ npd561")

Rscript -e 'install.packages("phangorn", repos=https://cloud.r-project.org/, lib="/scratch/er01/npd561")'
library(ape,lib="/scratch/er01/ npd561")

The above accomplished a complete installation of the Trycycler software from source with the latest versions of the packages ape and phangorn

The pipeline was re-run for samples barcode13 and barcode14 and this displayed a more specific error

Error in fastme.bal(distances) :  
    cannot build ME tree with less than 3 observations 

Next o The software Trycycler was re-installed from source again by replacing fastme.bal with bionj in Trycycler's cluster.py file (in the create_tree_script function). as suggested by the author

The above error was not resolved using bionj

Error in bionj(distances) :  
    cannot build a BIONJ tree with less than 3 observations 

Next At this point, a different thought process was applied -

A previously observed fact is :

So, this the best solution for the segmentation fault issue was thus identified to be as follows: • HASH OUT the module inside build_tree in the python script -cluster.py in the trycyler package #build_tree(seq_names, seqs, depths, matrix, args.out_dir, cluster_numbers). Thus trycycler DOES NOT CREATE THE TREE (which is not used in subsequent steps) and thus avoids throwing the error. • The software Trycycler was re-installed from source again with the build_tree module hashed out. • All 17 samples were included in a single run and a multiqc_report.html file was successfully generated.

What is pending

  1. Convert the above trycyler from source installation into a singularity image and use it to test the pipeline
  2. .nextflow/history.lock
    • When I pull the latest version of the pipeline and try to execute it, I am unable to proceed due to the following error .nextflow/history.lock (no such file or directory)
    • I have tried a few options as suggested by google and chatgpt but it has not worked.
    • The testing of the R-solution was done in one of my older local versions of the pipeline. Once the lock issue is sorted, the tesing can be finalised in the most recent version.