Closed DrB-S closed 2 months ago
Oops! I see the problem. It should be --msa nextclade!
I ran with --msa nextclade, but snps and trees are still not being produced.
PhyTreeViz can't calculate a tree with the number of nodes in the nextclade newick file, unfortunately. You should still be able to view this tree with some other tool, but it does have all of the nextclade data nodes included. There's also an auspice file (json) in the nextclade directory that you can use to upload to https://auspice-us.herokuapp.com/ to view.
The SNP matrix should still be created, though. How many samples are you running?
185
That shouldn't be too many. What files are in your nextclade directory?
It didn't produce a nextclade directory. I noticed that I had included NC_063383.1.fasta and NC_063383.1.gff (Monkeypox) in the fastas and gff dir, which must have thrown it all off. I have removed them and am running anew: nextflow run UPHL-BioNGS/Cecret -profile singularity, wastewater --relatedness true --msa nextclade --freyja_demix_options '--depthcutoff 10'. Do I need to specify the outgroup (MN908947.3) on the command-line or will it be pulled automatically from the genomes dir?
These are the config settings for wastewater : https://github.com/UPHL-BioNGS/Cecret/blob/master/configs/sarscov2_wastewater.config
Nextclade is turned off for wastewater since it doesn't mean anything.
What kind of samples are you running?
Wastewater reads.
The pipeline failed at the end, but it did produce a snp matrix: [40/5652c9] process > CECRET:cecret:seqyclean (2023WW0348) [100%] 185 of 185 ✔
[3b/d1e14e] process > CECRET:cecret:bwa (2023WW0348) [100%] 185 of 185 ✔
[c1/6645ae] process > CECRET:cecret:sort (2023WW0348) [100%] 185 of 185 ✔
[96/d6bc91] process > CECRET:cecret:ivar_trim (2023WW0373) [100%] 181 of 181 ✔
[6a/eafdd9] process > CECRET:cecret:ivar (2023WW0373) [100%] 181 of 181 ✔
[- ] process > CECRET:cecret:artic_read_filtering -
[- ] process > CECRET:cecret:artic -
[06/61445f] process > CECRET:qc:fastqc (2023WW0384) [100%] 185 of 185 ✔
[- ] process > CECRET:qc:kraken2 -
[4a/fd0372] process > CECRET:qc:samtools_intial_stats (2023WW0348) [100%] 185 of 185 ✔
[86/8dba39] process > CECRET:qc:aci (2023WW0373) [ 96%] 174 of 181
[f7/047586] process > CECRET:qc:samtools_flagstat (2023WW0373) [100%] 181 of 181 ✔
[78/96e523] process > CECRET:qc:samtools_depth (2023WW0373) [100%] 181 of 181 ✔
[5e/dec0d2] process > CECRET:qc:samtools_coverage (2023WW0373) [100%] 181 of 181 ✔
[49/c0bbce] process > CECRET:qc:samtools_stats (2023WW0373) [100%] 181 of 181 ✔
executor > local (3464)
[- ] process > CECRET:fasta_prep -
[40/5652c9] process > CECRET:cecret:seqyclean (2023WW0348) [100%] 185 of 185 ✔
[3b/d1e14e] process > CECRET:cecret:bwa (2023WW0348) [100%] 185 of 185 ✔
[c1/6645ae] process > CECRET:cecret:sort (2023WW0348) [100%] 185 of 185 ✔
[96/d6bc91] process > CECRET:cecret:ivar_trim (2023WW0373) [100%] 181 of 181 ✔
[6a/eafdd9] process > CECRET:cecret:ivar (2023WW0373) [100%] 181 of 181 ✔
[- ] process > CECRET:cecret:artic_read_filtering -
[- ] process > CECRET:cecret:artic -
[06/61445f] process > CECRET:qc:fastqc (2023WW0384) [100%] 185 of 185 ✔
[- ] process > CECRET:qc:kraken2 -
[4a/fd0372] process > CECRET:qc:samtools_intial_stats (2023WW0348) [100%] 185 of 185 ✔
[86/8dba39] process > CECRET:qc:aci (2023WW0373) [100%] 174 of 174
[f7/047586] process > CECRET:qc:samtools_flagstat (2023WW0373) [100%] 181 of 181 ✔
[78/96e523] process > CECRET:qc:samtools_depth (2023WW0373) [100%] 181 of 181 ✔
[5e/dec0d2] process > CECRET:qc:samtools_coverage (2023WW0373) [100%] 181 of 181 ✔
[49/c0bbce] process > CECRET:qc:samtools_stats (2023WW0373) [100%] 181 of 181 ✔
[f5/974fa9] process > CECRET:qc:bcftools_variants (2023WW0373) [100%] 181 of 181 ✔
[85/fdb91e] process > CECRET:qc:ivar_variants (2023WW0373) [100%] 181 of 181 ✔
[94/898a61] process > CECRET:qc:samtools_ampliconstats (2023WW0373) [100%] 181 of 181 ✔
[c0/f6b706] process > CECRET:qc:samtools_plot_ampliconstats (2023WW0373) [100%] 181 of 181 ✔
[0a/8f7a8e] process > CECRET:qc:igv_reports (2023WW0373) [100%] 174 of 174
[- ] process > CECRET:sarscov2:vadr -
[- ] process > CECRET:sarscov2:pangolin -
[- ] process > CECRET:sarscov2:pango_collapse -
[bb/c37dc5] process > CECRET:sarscov2:dataset (Downloading NextClade Dataset) [100%] 1 of 1 ✔
[84/4cbe5e] process > CECRET:sarscov2:nextclade (Clade Determination) [100%] 1 of 1 ✔
[1e/0b9dd3] process > CECRET:sarscov2:freyja_variants (2023WW0373) [100%] 181 of 181 ✔
[a3/0cfafe] process > CECRET:sarscov2:freyja_demix (2023WW0373) [100%] 178 of 178
[- ] process > CECRET:sarscov2:freyja_aggregate -
[96/d1913b] process > CECRET:msa:phytreeviz (Tree visualization) [100%] 2 of 2, failed: 2, retries: 1 ✔
[b5/1601a5] process > CECRET:msa:snpdists (creating snp matrix with snp-dists) [100%] 1 of 1 ✔
[- ] process > CECRET:msa:heatcluster -
[- ] process > CECRET:multiqc_combine -
[- ] process > CECRET:summary -
Pulling Singularity image docker://staphb/pangolin:4.3.1-pdata-1.24 [cache /data/nextflow_cachedir/staphb-pangolin-4.3.1-pdata-1.24.img]
Pulling Singularity image docker://staphb/vadr:1.6.3 [cache /data/nextflow_cachedir/staphb-vadr-1.6.3.img]
Pulling Singularity image docker://quay.io/uphl/heatcluster:1.0.2c-2024-01-09 [cache /data/nextflow_cachedir/quay.io-uphl-heatcluster-1.0.2c-2024-01-09.img]
[22/1f025d] NOTE: Missing output file(s) phytreeviz/tree.png
expected by process CECRET:msa:phytreeviz (Tree visualization)
-- Execution is retried (1)
[96/d1913b] NOTE: Missing output file(s) phytreeviz/tree.png
expected by process CECRET:msa:phytreeviz (Tree visualization)
-- Error is ignored
ERROR ~ Error executing process > 'CECRET:sarscov2:pangolin'
Caused by: Failed to pull singularity image command: singularity pull --name staphb-pangolin-4.3.1-pdata-1.24.img.pulling.1706563783424 docker://staphb/pangolin:4.3.1-pdata-1.24 > /dev/null status : 255 message: INFO: Converting OCI blobs to SIF format INFO: Starting build... Getting image source signatures Copying blob sha256:578acb154839e9d0034432e8f53756d6f53ba62cf8c7ea5218a2476bf5b58fc9 Copying blob sha256:644cc1f212f602ba382c1b343b65039eca8478ec9997a8a6d93bfffe90d24ad7 Copying blob sha256:a2d7dcebe2368f2ea4f5b52f2af1a55f234cf13fb2eff7332e39ba9e463f9af2 Copying blob sha256:4f4fb700ef54461cfa02571ae0db9a0dc1e0cdb5577484a6d75e68dc38e8acc1 Copying blob sha256:bb02f4fb31804257c11083dc6f3756d02bdf5c700a697707e6df24aecf18ba2b Copying blob sha256:a99d59b7d1ec90f43e2cb49e42a0ad44954bd01b699896abd73c6ee77ad943f7 Copying blob sha256:2ad9805fbbd597b3d4f40918d29216bab5b7a09c1ce353c4fd4c0251d1974a6c Copying blob sha256:4f4fb700ef54461cfa02571ae0db9a0dc1e0cdb5577484a6d75e68dc38e8acc1 Copying blob sha256:af33fed889b0eee32b75996a3baa3621592d0cc017b7e08be2676d866ea0f257 Copying blob sha256:db17c7428ea55d6ed2d2c3a33dfdf90f2bc08ab06549c4b65a9790d2e7fac25e Copying blob sha256:bf0189e4667ce6526f4a593ff5c7dbb3637a4586cad77c90bb6ce6c11a231d18 Copying blob sha256:07e2ea468e00466d83f52be0c5e9b48c0d381019c038877e338ed903c237dc82 Copying blob sha256:5b602560f5480f5440f27b362a15534aef7980e8e12f763c4d61b4426f3f7844 Copying blob sha256:399caa5c4226e9971a64bc3094432f85c7b8a8b215c5f317909ae894f920a0bf Copying blob sha256:d552d70732e78bbc1076ed7f32dc703070bc1d651cd472462c56162841e4ede1 Copying blob sha256:4f4fb700ef54461cfa02571ae0db9a0dc1e0cdb5577484a6d75e68dc38e8acc1 Copying blob sha256:cbace4e6ede6e0b98a21fbd416170136ee448a6c7d4fa404f5ec0f34b5fca1ff Copying blob sha256:4f4fb700ef54461cfa02571ae0db9a0dc1e0cdb5577484a6d75e68dc38e8acc1 Copying config sha256:df652314e69365ed5734e4c9b45d629e8c0afe41210ed2a56a6e81441eac62af Writing manifest to image destination Storing signatures FATAL: While making image from oci registry: error fetching image to cache: while building SIF from layers: conveyor failed to get: no descriptor found for reference "caa2b887caa62ca47ebb0a3e15b5d2f5b823162b1f998120ab89addf94ca363e"
-- Check '.nextflow.log' file for details
It looks like pangolin failed to download in that error. That happens for the larger singularity files. If you -resume
it should try again. You can also download the image manually and move it to your singularity cache directory
singularity pull --name staphb-pangolin-4.3.1-pdata-1.24.img docker://staphb/pangolin:4.3.1-pdata-1.24
mv staphb-pangolin-4.3.1-pdata-1.24.img <directory where you keep your singularity images>/.
Will do. Thanks!
I don't think the results for nextclade or pangolin are useful with wastewater samples... or really anything that has to do with the consensus fasta. I'm curious. What you're using this information for?
I need lineages, abundance, and reference coverage, which I already have. I thought I would also produce a snp matrix and tree.
I have run the newest version of Cecret on reads but neither iqtree2 nor phytreeviz are producing a directory. I created a new config file (contents below) to shorten the command-line:
params.species = 'sarscov2'
params.nextclade_dataset = 'sars-cov-2'
params.vadr_options = '--split --glsearch -s -r --nomisc --lowsim5seq 6 --lowsim3seq 6 --alt_fail lowscore,insertnn,deletinn'
params.vadr_reference = 'sarscov2'
params.vadr_trim_options = '--minlen 50 --maxlen 30000'
params.iqtree2 = 'true'
params.iqtree2_outgroup = 'MN908947.3'
params.relatedness = 'true'
params.msa = 'nextclade'
params.freyja_demix_options = '--depthcutoff 10'
params.freyja_boot_options = '--nb 1000'
iqtree is only run after mafft. I should allow iqtree to run on the nextclade msa.
I'm guessing the newest version of phytreeviz still isn't liking the nextclade newick file.
The nextclade newick file is being produced in the work dir. Here is the error file (maybe the newick file is too big):
Matplotlib created a temporary cache directory at /tmp/matplotlib-3tgxgmbj because the default path (/app/becksts/.config/matplotlib) is not a writable directory; it is highly recommended to set the MPLCONFIGDIR environment variable to a writable directory, in particular to speed up the import of Matplotlib and to better support multiprocessing.
Matplotlib created a temporary cache directory at /tmp/matplotlib-w5yrbqnq because the default path (/app/becksts/.config/matplotlib) is not a writable directory; it is highly recommended to set the MPLCONFIGDIR environment variable to a writable directory, in particular to speed up the import of Matplotlib and to better support multiprocessing.
Traceback (most recent call last):
File "/usr/local/bin/phytreeviz", line 8, in
PhyTreeViz can't calculate a tree with the number of nodes in the nextclade newick file, unfortunately. You should still be able to view this tree with some other tool, but it does have all of the nextclade data nodes included. There's also an auspice file (json) in the nextclade directory that you can use to upload to https://auspice-us.herokuapp.com/ to view.
The SNP matrix should still be created, though. How many samples are you running?
Yeah, phytreeviz is still having issues with how many nodes are in this tree. Have you tried looking at nextclade's newick file in itol or some other software?
I tried running the latest version of nextalign (as opposed to nextclade) today, but it gives the same result. I'm glad the multiple sequence alignment file is generated as expected, but the newick tree has too many nodes.
Would it be helpful to you if nextclade's multiple sequence alignment was fed into iqtree2 for phylogenetic tree creation?
That could be useful.
The latest version of Cecret (https://github.com/UPHL-BioNGS/Cecret/releases/tag/3.13.20240319) will use the multiple sequence alignment file from nextclade in the iqtree2 process to create a tree.
I ran the newest version of Cecret on 185 pairs of Covid reads files, and it ran super-fast! I expected to see snps and trees, but they weren’t produced. Here is the command-line: nextflow run UPHL-BioNGS/Cecret -profile singularity -c configs/sarscov2_wastewater.config --relatedness true --msa nextalign --freyja_demix_options ‘--depthcutoff 10’ --freyja_boot_options ‘--nb 1000’. I see that both msa.nf and sarscov2.nf use ch_fasta as input. Do I need to rerun using the consensus fastas as input instead, or is there a way to do this directly from the reads?