Sydney-Informatics-Hub / ONT-bacpac-nf

Bacterial profiling workflow for ONT data, written in Nextflow.
GNU General Public License v3.0
1 stars 0 forks source link

Alpha implementation of end-to-end workflow #1

Closed georgiesamaha closed 2 months ago

georgiesamaha commented 4 months ago

Not ready for a merge. This PR covers alpha implementation of end-to-end workflow from porechop/nanoplot though to multiqc report creation

georgiesamaha commented 4 months ago

Issues in development scripts have prevented me from going further. In order to implement initial QC, contamination screen, and assemblies the following things need to be clarified:

nandan75 commented 3 months ago

I have made changes and pushed them to this branch

  1. No explanation on how to set up input directory which capture pycoqc and nanoplot inputs
  2. PycoQC isn't implemented in the run script.
  1. Multiple scripts for nanoPlot but neither are implemented in the run script. Which one should be run? This one or this one?

  2. Kraken2 contamination screen script has lots of hashed out code, I'm unsure what needs to be implemented, what has yet to be tested, and reasoning behind this structure. Additionally, some parts require manual intervention without explanation.

    • I have now cleaned the script of the unwanted #ed lines. The script for kraken2 in this repo is here
georgiesamaha commented 3 months ago

Currently stuck at select_assembly step. Issue with parsing input trycycler cluster files to the python script without overwriting other clusters. Have started on docs/ for specific instructions re: execution on Gadi while we're still testing.

georgiesamaha commented 3 months ago

Have hit an issue with medaka_polish_flye process.

Error description Process medaka_polish_flye fails to index assembly fasta due to empty Chr_contigs/flyeChromosomes.fasta file for barcode10. This sample is expected to fail consensus and proceed to flye only assembly. Might there be a problem with select_assembly script not outputting the right files?

Relevant files

# workdir
/scratch/er01/gs5517/workflowDev/ONT-bacpac-nf/work/98/1bf0a1f57d8492b404145cd9c53f1c/barcode10_flye_assembly/Chr_contigs/flyeChromosomes.fasta

# select_assembly.py
/scratch/er01/gs5517/workflowDev/ONT-bacpac-nf/bin/select_assembly.py

# test run script
/scratch/er01/gs5517/workflowDev/ONT-bacpac-nf/test/run_test.sh
nandan75 commented 3 months ago

Hi @georgiesamaha

I think I have debugged the issue. Please test the pipeline now

nandan75 commented 2 months ago

Hi @georgiesamaha

Thanks

nandan75 commented 2 months ago
nandan75 commented 2 months ago

Added and tested processes

Next: To work on multiqc pipeline

nandan75 commented 2 months ago
nandan75 commented 2 months ago
nandan75 commented 2 months ago
nandan75 commented 2 months ago

To be worked on

nandan75 commented 2 months ago
nandan75 commented 2 months ago

A few things pending as of 18/07/24 (1) Phylogeny-heatmap image generated (to be added to multiqc) - but an issue

(2) Selected plots from pycoqc output to be added to multiqc - Integration of script in progress

(3) barcode13 related (possible) memory error to be investigated and resolved

Anything else!!?

nandan75 commented 2 months ago

Hi @georgiesamaha

Good morning.

regards

georgiesamaha commented 2 months ago

All processes functional 🎉

Next step: organising results and docs.