Some of the tested samples e.g. barcode13, barcode14 etc displayed the following issue in the script modules/run_trycycler_cluster.nf
*** caught segfault ***
address 0x38, cause 'memory not mapped'
The other samples ran seamlessly without the above error.
A previous issue logged on the Trycycler github is identical to what was observed in our pipeline.
The author has left the issue unresolved with a reasoning that it originates from the R-package ape but they have suggest the following possible solution to try :
o Reinstalling R's ape package.
o Using ape's bionj function instead. This would require replacing fastme.bal with bionj in Trycycler's cluster.py file (in the create_tree_script function).
The first thing which I tried is the above options
Local installation of trycycler from source on NCI-Gadi to check if updated dependencies (e.g. R package - ape as suggested in the Trycycler repo issue) can solve this problem
Note that the above command installed Trycycler itself and the Python packages it needs (Edlib, NumPy and SciPy). It did not install the external tools that Trycycler requires. For those, please look at the Software requirements page.
The above accomplished a complete installation of the Trycycler software from source with the latest versions of the packages ape and phangorn
The pipeline was re-run for samples barcode13 and barcode14 and this displayed a more specific error
Error in fastme.bal(distances) :
cannot build ME tree with less than 3 observations
Next
o The software Trycycler was re-installed from source again by replacing fastme.bal with bionj in Trycycler's cluster.py file (in the create_tree_script function). as suggested by the author
The above error was not resolved using bionj
Error in bionj(distances) :
cannot build a BIONJ tree with less than 3 observations
NextAt this point, a different thought process was applied -
A previously observed fact is :
Although the run_trycycler_clustering,nf process threw an error for certain samples, the OUTPUT REQUIRED for the subsequent Trycycler steps was always generated (and is verified to be correct)
The outout from this step is
o contigs.phylip: a matrix of the Mash distances between all contigs in PHYLIP format.
o contigs.newick: a FastME tree of the contigs built from the distance matrix. This can be visualised in a phylogenetic tree viewer such as FigTree, Dendroscope or Archaeopteryx.
o One directory for each of the clusters: cluster_001, cluster_002, etc. These directories will each contain a subdirectory named 1_contigs which includes the FASTA files for the contigs in the cluster.
Of the above three, the only output which is carried forward and used by the next (Trycycler reconciliation step) are the subdirectories for each of your good clusters, each of which contains a 1_contigs subdirectory. THESE ARE ALWAYS GENERATED and the correctness of the final assemblies from this for all barcodes (including the problematic ones) has already been confirmed.
o The other output which is contigs.phylip needed to generate the third output i.e. contigs.newick (which throws the error for certain samples) is for visualisation purposes only (if needed – not a requirement for our pipeline)
So, this the best solution for the segmentation fault issue was thus identified to be as follows:
• HASH OUT the module inside build_tree in the python script -cluster.py in the trycyler package
#build_tree(seq_names, seqs, depths, matrix, args.out_dir, cluster_numbers). Thus trycycler DOES NOT CREATE THE TREE (which is not used in subsequent steps) and thus avoids throwing the error.
• The software Trycycler was re-installed from source again with the build_tree module hashed out.
• All 17 samples were included in a single run and a multiqc_report.html file was successfully generated.
What is pending
Convert the above trycyler from source installation into a singularity image and use it to test the pipeline
.nextflow/history.lock
When I pull the latest version of the pipeline and try to execute it, I am unable to proceed due to the following error
.nextflow/history.lock (no such file or directory)
I have tried a few options as suggested by google and chatgpt but it has not worked.
The testing of the R-solution was done in one of my older local versions of the pipeline. Once the lock issue is sorted, the tesing can be finalised in the most recent version.
What is the
segmentation fault
issuebarcode13, barcode14 etc
displayed the following issue in the scriptmodules/run_trycycler_cluster.nf
R-package ape
but they have suggest the following possible solution to try : o Reinstalling R's ape package. o Using ape's bionj function instead. This would require replacing fastme.bal with bionj in Trycycler's cluster.py file (in the create_tree_script function).The first thing which I tried is the above options Local installation of trycycler from source on NCI-Gadi to check if updated dependencies (e.g. R package - ape as suggested in the Trycycler repo issue) can solve this problem
Trycycler installation using source
Note that the above command installed Trycycler itself and the Python packages it needs (Edlib, NumPy and SciPy). It did not install the external tools that Trycycler requires. For those, please look at the Software requirements page.
Dependencies - Software requirements
Minisasm/minimap2 : https://github.com/lh3/miniasm • git clone https://github.com/lh3/minimap2 && (cd minimap2 && make)
o Installation worked o Executable: $ PATH/minimap2/minimap2
• git clone https://github.com/lh3/miniasm && (cd miniasm && make)
o Installation worked o Executable: $PATH/miniasm/miniasm
Mash: https://mash.readthedocs.io/en/latest/
MUSCLE: https://drive5.com/muscle/downloads_v3.htm
o wget https://drive5.com/muscle/downloads3.8.31/muscle3.8.31_i86linux64.tar.gz o tar -zxvf muscle3.8.31_i86linux64.tar.gz Installation worked Executable: $PATH/muscle3.8.31_i86linux64
R with phylogenetics packages (This part was accomplished with Nathanial’s assistance Thanks @natbutter :-))
The above accomplished a complete installation of the Trycycler software from source with the latest versions of the packages
ape
andphangorn
The pipeline was re-run for samples
barcode13 and barcode14
and this displayed a more specific errorNext o The software Trycycler was re-installed from source again by replacing
fastme.bal with bionj in Trycycler's cluster.py file (in the create_tree_script function).
as suggested by the authorThe above error was not resolved using bionj
Next At this point, a different thought process was applied -
A previously observed fact is :
contigs.phylip
needed to generate the third output i.e. contigs.newick (which throws the error for certain samples) is for visualisation purposes only (if needed – not a requirement for our pipeline)So, this the best solution for the segmentation fault issue was thus identified to be as follows: • HASH OUT the module inside
build_tree
in the python script -cluster.py
in the trycyler package#build_tree(seq_names, seqs, depths, matrix, args.out_dir, cluster_numbers)
. Thus trycycler DOES NOT CREATE THE TREE (which is not used in subsequent steps) and thus avoids throwing the error. • The software Trycycler was re-installed from source again with thebuild_tree
module hashed out. • All 17 samples were included in a single run and a multiqc_report.html file was successfully generated.What is pending
trycyler from source installation
into a singularity image and use it to test the pipeline.nextflow/history.lock (no such file or directory)