Closed abhi18av closed 2 years ago
Hi! Thanks for the the efforts.
I have checked the changes, and it seems that everything will run ok because it just changes the cases and nothing else, however a quick test is required to check if no typo is present.
When skimming through it, I found two typos:
The only way I can think of testing it is running an example with a small bacterial genome and with a nanopore subset with fast5 to try the methylation calling process.
As the second one using raw reads and fast5 can be a little demanding I can make these tests myself directly using the branch from your fork after the typos have been fixed and before merging it.
😄
Hi @abhi18av, thank you for accomodating the changes and all the help.
I will perform a rapid test to check if nextflow doesn't find another typo, but everything should run smoothly. As soon as the test finishes, I'll merge it in the develop
branch.
Talking about the develop
. Which I believe it is comming very near to trigger a new release. We could later discuss about when (which issues to be in it) to create a new one.
When running the test it complained about a flye_ch
channel:
$ nextflow run \
abhi18av/bacannot -r abhinav/refactor-channel-identifiers \
-profile test --threads 10 --output ABHINAV
Launching `abhi18av/bacannot` [astonishing_albattani] - revision: 0f58558832 [abhinav/refactor-channel-identifiers]
=================================================================
Container-based, fmalmeida/bacannot, Genome Annotation Pipeline
=================================================================
Input genomes : https://github.com/fmalmeida/bacannot/raw/master/test_data/samplesheet.yaml
Output dir : ABHINAV
Threads : 10
Blast % ID - Virulence Genes: 90
Blast query coverage - Virulence Genes: 80
Blast % ID - AMR Genes: 90
Blast query coverage - AMR Genes: 80
Blast % ID - ICEs or Phages: 65
Blast query coverage - ICEs or Phages: 65
Blast % ID - Plasmids: 90
Blast query coverage - Plasmids: 60
Pipeline Release: abhinav/refactor-channel-identifiers
Current home : /home/falmeida
Current user : falmeida
Current path : /work1/falmeida/bacannot_tests
Configuration file: /home/falmeida/.nextflow/assets/abhi18av/bacannot/nextflow.config
==============================================================
No such variable: flye_ch
-- Check script '/home/falmeida/.nextflow/assets/abhi18av/bacannot/./workflows/bacannot.nf' at line: 132 or see '.nextflow.log' file for more details
The problem stated is in line 132 where you've changed it to flye_ch
and unicycler_ch
:
PROKKA(parsed_inputs.annotation_ch.mix(flye_ch.out[1], unicycler_ch.out[1]))
However, in this step, prokka module expects the outputs from assemblers modules, so it should be something like:
PROKKA(parsed_inputs.annotation_ch.mix(FLYE.out[1], UNICYCLER.out[1]))
Thanks for the catch again @fmalmeida https://github.com/fmalmeida/bacannot/pull/45#issuecomment-998998287 , I've addressed it in https://github.com/fmalmeida/bacannot/pull/45/commits/03cde4724a3be38e357dd0010302927d1382df94
Hi @abhi18av, thanks for the change. I've tried it again and it now complained the following:
Missing process or function with name 'iceberg'
-- Check script '/home/falmeida/.nextflow/assets/abhi18av/bacannot/main.nf' at line: 135 or see '.nextflow.log' file for more details
The problem seems in line 209:
// ICEs search
if (params.skip_iceberg_search == false) {
// ICEberg db
iceberg(PROKKA.out[4], PROKKA.out[3])
iceberg_output_ch = ICEBERG.out[1]
iceberg_output_2_ch = ICEBERG.out[2]
} else {
iceberg_output_ch = Channel.empty()
iceberg_output_2_ch = Channel.empty()
}
Ah, okay. I've now fixed it with https://github.com/fmalmeida/bacannot/pull/45/commits/6b77030f608c3577be086eb18a8bda4539eba771
(Feliz Natal! 🎄🎅 )
Feliz Natal! 🌲
It almost worked, but it know complained about this line:
/*
Eighth step -- Merge all annotations with the same Prefix value in a single Channel
*/
annotations_files_ch = PROKKA.out[3].join(PROKKA.out[1])
.join(MLST.out[0])
.join(BARRNAP.out[0])
.join(COMPUTE_GC.out[0])
.join(kofamscan_output_ch, remainder: true)
.join(vfdb_output_ch, remainder: true)
.join(victors_output_ch, remainder: true)
.join(amrfinder_output_ch, remainder: true)
.join(resfinder_gff_ch, remainder: true)
.join(rgi_output_ch, remainder: true)
.join(iceberg_output_ch, remainder: true)
.join(phast_output_ch, remainder: true)
.join(phigaro_output_2_ch, remainder: true)
.join(FIND_GIS.out[0], remainder: true)
// Contatenation of annotations in a single GFF file
MERGE_ANNOTATIONS(ANNOTATIONS_FILES.join(DIGIS.out[1], remainder: true))
The annotation files channel is not ANNOTATIONS_FILES
, but annotations_files_ch
. I believe it could even be changed to annotation_files_ch
.
After that I believe it would work.
Hey @fmalmeida , fixed that one now and keeping my 🤞 😄
As an aside, to speed up such code reviews Github has a couple features
The first is called Suggestions
- which helps in adding the exact code change which you feel is the cause for these problems.
The other one is gh
the Github CLI, which allows you to contribute directly to this branch (even if it's from a fork, I think) with command like gh pr checkout 45
Apart from this, happy to address any other issues.
Ah, there's still another one you found. Apologies for this Felipe! 😅
I'd really appreciate if there's a light-weight testing data I could use on my end - hopefully this would avoid these issues in future
Hi @abhi18av,
No worries. I still not checked if it finished properly. I will check it again after 06/Jan.
Then, I can also think about a small dataset to share with you. The dataset just can't be too small in order to properly test the majority of processes.
For now, there are two 30X bacterial seq reads available with -profile test
. I know it is not too small so it may take ~40 min to finish. But at least it enables the check-up of the NF code itself.
After coming back from vacations I will put more efforts on find a more appropriate testing dataset.
Furthermore,
Happy new year, and happy celebrations. All the best and thank you for helping with the code 🥳😁
The tests here have finished successfully, so I will now merge this PR.
I have created a new issue #46 to handle this "problem" of a smaller dataset for quicker tests.
😄
This PR builds upon the discussion done in #38 .
I'm not sure how to test locally, so I've initiated this PR to track all changes.
Unless I have ignored something crucial, as of now the PR implements the refactor of identifiers used in the modules/workflows.
Happy to accomodate any change request :)