Closed georgiesamaha closed 2 months ago
Issues in development scripts have prevented me from going further. In order to implement initial QC, contamination screen, and assemblies the following things need to be clarified:
I have made changes and pushed them to this branch
These two steps are to be run once per "experimental run/batch i.e. all samples
Input to both the module is *raw_data/sequencing_summary.txt** which is a file generated by the ONT sequencing run itself.
When testing the bash pipeline, I have kept the above two scripts out of the once-per sample
run_pipeline.sh. The input sequencing_summary*.txt
raw_data
as the rest of the individual (per sample barcode) .zip
files.The run_pipeline.sh
script is run once-per-sampleID. So there is no mention of the above two modules.
Multiple scripts for nanoPlot but neither are implemented in the run script. Which one should be run? This one or this one?
Kraken2 contamination screen script has lots of hashed out code, I'm unsure what needs to be implemented, what has yet to be tested, and reasoning behind this structure. Additionally, some parts require manual intervention without explanation.
Currently stuck at select_assembly
step. Issue with parsing input trycycler cluster files to the python script without overwriting other clusters. Have started on docs/
for specific instructions re: execution on Gadi while we're still testing.
Have hit an issue with medaka_polish_flye
process.
Error description
Process medaka_polish_flye
fails to index assembly fasta due to empty Chr_contigs/flyeChromosomes.fasta
file for barcode10. This sample is expected to fail consensus and proceed to flye only assembly. Might there be a problem with select_assembly
script not outputting the right files?
Relevant files
# workdir
/scratch/er01/gs5517/workflowDev/ONT-bacpac-nf/work/98/1bf0a1f57d8492b404145cd9c53f1c/barcode10_flye_assembly/Chr_contigs/flyeChromosomes.fasta
# select_assembly.py
/scratch/er01/gs5517/workflowDev/ONT-bacpac-nf/bin/select_assembly.py
# test run script
/scratch/er01/gs5517/workflowDev/ONT-bacpac-nf/test/run_test.sh
Hi @georgiesamaha
I think I have debugged the issue. Please test the pipeline now
get_all_chromosomal_contigs_using_genome_size
without adding the required chromosomal contig.Hi @georgiesamaha
_final
directory etc, once I feel better by tomorrow.Thanks
Multiple reconciled-CHR clusters belonging to a sampleid were handled (previously seen to be missing at the msa step and thereafter).
Pipeline successfully tested for multiple samples from raw reads to busco
for
Steps still to be integrated (will work this week to finish)
Encountered a problem with barcode13 (possibly memory related) - to be debugged.
Code needs refinement and documentation - will work in coming week.
Added and tested processes
quast (today)
bakta
busco
amrfinderplus (today)
publishDir used for a few results for testing purposes - Can be shuffled later as required
Pipeline tested for raw reads to amrfinderplus for barcode01/03/05/06/10
Next: To work on multiqc pipeline
Phylogeny tree generation (newick tree) using orthofinder works well with
TDB
Plassembler script to be integrated
Debug issue with barcode 13!
multiqc-supported modules
such as
To be worked on
A few things pending as of 18/07/24 (1) Phylogeny-heatmap image generated (to be added to multiqc) - but an issue
(2) Selected plots from pycoqc output to be added to multiqc - Integration of script in progress
(3) barcode13 related (possible) memory error to be investigated and resolved
Anything else!!?
Hi @georgiesamaha
Good morning.
bash test/run_test.sh
or qsub test/run_test.sh
when you feel free and let me know if it breaks.sequence_summary*
as an input parameter.reads-to-multiqc.html
font-display
issue with phylogeny-heatmap.png
image is still unresolved. I will try and look at this morning. The image without fonts is included in the multiqc report.all/subset
of samples as in the in
path in bash test/run_test.sh
. text formatting
is still pending.regards
All processes functional 🎉
Next step: organising results and docs.
Not ready for a merge. This PR covers alpha implementation of end-to-end workflow from porechop/nanoplot though to multiqc report creation