Closed JavariaAshraf closed 4 months ago
Hi @JavariaAshraf , The error is because this branch does not exist anymore. I have merged the code in a patch release last week when we finished the other issue.
Try running pointing it to the new version instead of the missing branch, like:
nextflow run fmalmeida/bacannot -r v3.3.2 …
Cheers.
One more thing, @JavariaAshraf
However, even after solving the problem of the non-existing branch, I still believe it will fail because you cannot modify the file and run it again. When resuming, nextflow will create a new working directory for the job, and your modifications would be ignored.
Thus, because this is the very last module and having a circos with so many points is not meaningful anyways, I suggest you create the following file, called circos.config
in order to make the pipeline ignore the error in this module, and run the pipeline with it.
contents of circos.config file
process {
withName: 'CIRCOS' {
errorStrategy = 'ignore'
}
}
And run like this:
nextflow run fmalmeida/bacannot -r v3.3.2 -c circos.config <rest of your params> -resume
Finally, because this CIRCOS
module is the last one, and it is not meaningful, I will add in the weekend two parameters to manage it, one to allow someone to skip it, and another one to allow someone to easily ignore the errors it produces (like with the config I shared with you will do).
The difference is:
In both cases, at least, it should avoid breaking the pipeline.
Can you give it a try, using this custom config and the correct revision as suggested and see if it helps?
Depending on the feedback I will know what to set up as an action plan.
Cheers 😄
Hi @fmalmeida, I started the run as suggested above: Got this error: Kindly review
`Caused` by:
Process `BACANNOT:MERGE_ANNOTATIONS (vibrio31)` terminated with an error exit status (1)
Command executed:
# Rename gff and remove sequence entries
# bakta has region entries
awk '$3 == "CDS"' prokka_gff | grep "ID=" > vibrio31.gff ;
## Increment GFF with custom annotations
### VFDB
if [ ! $(cat vibrio31_vfdb_blastn_onGenes.txt | wc -l) -le 1 ]
then
addBlast2Gff.R -i vibrio31_vfdb_blastn_onGenes.txt -g vibrio31.gff -o vibrio31.gff -d VFDB -t Virulence ;
grep "VFDB" vibrio31.gff > virulence_vfdb.gff ;
fi
### Victors
if [ ! $(cat vibrio31_victors_blastp_onGenes.txt | wc -l) -le 1 ]
then
addBlast2Gff.R -i vibrio31_victors_blastp_onGenes.txt -g vibrio31.gff -o vibrio31.gff -d Victors -t Virulence ;
grep "Victors" vibrio31.gff > virulence_victors.gff ;
fi
### KEGG Orthology
## Reformat KOfamscan Output
if [ ! $(cat vibrio31_ko_forKEGGMapper.txt | wc -l) -eq 0 ]
then
awk \
-F'\t' \
-v OFS='\t' \
'{x=$1;$1="";a[x]=a[x]$0}END{for(x in a)print x,a[x]}' \
vibrio31_ko_forKEGGMapper.txt | \
sed \
-e 's/\t/,/g' \
-e 's/,,/\t/g' | \
awk '$2!=""' > formated.txt ;
addKO2Gff.R -i formated.txt -g vibrio31.gff -o vibrio31.gff -d KEGG ;
fi
### ICEs
if [ ! $(cat vibrio31_iceberg_blastp_onGenes.txt | wc -l) -le 1 ]
then
addBlast2Gff.R -i vibrio31_iceberg_blastp_onGenes.txt -g vibrio31.gff -o vibrio31.gff -d ICEberg -t ICE ;
grep "ICEberg" vibrio31.gff > ices_iceberg.gff ;
fi
### Prophages
if [ ! $(cat vibrio31_phast_blastp_onGenes.txt | wc -l) -le 1 ]
then
addBlast2Gff.R -i vibrio31_phast_blastp_onGenes.txt -g vibrio31.gff -o vibrio31.gff -d PHAST -t Prophage ;
grep "PHAST" vibrio31.gff > prophages_phast.gff ;
fi
### Resistance
#### RGI
if [ ! $(cat RGI_vibrio31.txt | wc -l) -le 1 ]
then
addRGI2gff.R -g vibrio31.gff -i RGI_vibrio31.txt -o vibrio31.gff ;
grep "CARD" vibrio31.gff > resistance_card.gff ;
fi
#### AMRFinderPlus
if [ ! $(cat AMRFinder_resistance-only.tsv | wc -l) -le 1 ]
then
addNCBIamr2Gff.R -g vibrio31.gff -i AMRFinder_resistance-only.tsv -o vibrio31.gff -t Resistance -d AMRFinderPlus ;
grep "AMRFinderPlus" vibrio31.gff > resistance_amrfinderplus.gff ;
fi
#### Resfinder
if [ ! $(cat results_tab.gff | wc -l) -eq 0 ]
then
bedtools intersect -a results_tab.gff -b vibrio31.gff -wo | sort -k19,19 -r | awk -F '\t' '!seen[$9]++' > resfinder_intersected.txt ;
addBedtoolsIntersect.R -g vibrio31.gff -t resfinder_intersected.txt --type Resistance --source Resfinder -o vibrio31.gff ;
grep "Resfinder" vibrio31.gff > resistance_resfinder.gff ;
rm -f resfinder_intersected.txt ;
fi
#### Custom Blast databases
for file in input.11 ;
do
if [ ! $(cat $file | wc -l) -eq 0 ]
then
db=${file%%_custom_db.gff} ;
bedtools intersect -a ${file} -b vibrio31.gff -wo | sort -k19,19 -r | awk -F '\t' '!seen[$9]++' > bedtools_intersected.txt ;
addBedtoolsIntersect.R -g vibrio31.gff -t bedtools_intersected.txt --type "CDS" --source "${db}" -o vibrio31.gff ;
grep "${db}" vibrio31.gff > custom_database_${db}.gff ;
rm -f bedtools_intersected.txt ;
fi
done
### digIS transposable elements
touch transposable_elements_digis.gff
if [ -s digis_gff ]
then
( cat digis_gff | sed 's/id=/ID=/g' > transposable_elements_digis.gff && rm digis_gff ) ;
cat vibrio31.gff transposable_elements_digis.gff | bedtools sort > tmp.out.gff ;
( cat tmp.out.gff > vibrio31.gff && rm tmp.out.gff );
fi
### integron_finder results
### integrons are unique / complete elements and should not be intersected
cat vibrio31.gff vibrio31_integrons.gff | bedtools sort > tmp.gff ;
cat tmp.gff > vibrio31.gff
rm tmp.gff
Command exit status:
1
Command output:
(empty)
Command error:
Error: malformed GFF entry at line 3545. Coordinate detected that is < 1. Exiting.
Work dir:
`/home/cdc-bioinfo/Vibrio-Feb2024/work/a9/4e823fd5dee91b46737e4606324995`
Please help.
Hi @JavariaAshraf ,
Once again it seems you have 0-based annotation because something was found in the very first base.
However this time it is not clear which one is it. Can you send me this working directory (/home/cdc-bioinfo/Vibrio-Feb2024/work/a9/4e823fd5dee91b46737e4606324995
) with all the files that are available inside it?
I can take a look during the week. In the meantime I would recommend removing the “problematic” genome from the run.
Cheers.
Please see the attachment.
4e823fd5dee91b46737e4606324995.zip
Do I have to re-run the pipeline from start? It takes a lot of time. Any way to resume from last step? The resume option doesn't work, it starts the pipeline from first step. Please guide. Thanks
So better to wait for a fix. I believe it is not resuming because the samplesheet is different (when removing the genome).
I think it may still be the integron finder
file. Can you send me these files that were not copied in the dir (only the links came):
/home/cdc-bioinfo/Vibrio-Feb2024/work/e8/6a17b21b8ad6b40d2f34e3a95aebb5/vibrio31_integrons.gff
/home/cdc-bioinfo/Vibrio-Feb2024/work/ce/4fede0a759e0aef9aac4cd449b28a1/vibrio31_phast_blastp_onGenes.txt
/home/cdc-bioinfo/Vibrio-Feb2024/work/83/f663818f65e05cab793bd795019c3e/vibrio31_vfdb_blastn_onGenes.txt
/home/cdc-bioinfo/Vibrio-Feb2024/work/8e/2ae390aa3b9292f043f643e51f2b5a/vibrio31_victors_blastp_onGenes.txt
/home/cdc-bioinfo/Vibrio-Feb2024/work/79/a434755b9f49795debc139b78e9f58/KOfamscan/vibrio31_ko_forKEGGMapper.txt
/home/cdc-bioinfo/Vibrio-Feb2024/work/c3/b20059a379c86919f976bbbf494728/vibrio31_iceberg_blastp_onGenes.txt
/home/cdc-bioinfo/Vibrio-Feb2024/work/09/33e52bffb0c86717fd85cb2be8b7e8/RGI_vibrio31.txt
/home/cdc-bioinfo/Vibrio-Feb2024/work/c7/034a3019b2ba76d63ca5e9889c7710/resfinder/results_tab.gff
/home/cdc-bioinfo/Vibrio-Feb2024/work/37/e0acf6a9f5f210b6d007903bec86fa/AMRFinder_resistance-only.tsv
Please find the files: They were right-protected and was unable to copy Now they are attached. 4e823fd5dee91b46737e4606324995copy.zip
They are still not copied. Only the links are comming, not the real files. See below:
4e823fd5dee91b46737e4606324995copy.zip Please see... I have renamed them. I hope they are accessible now.
Hi @JavariaAshraf ,
For some reason, some of the integron finder
results are with negative coordinates:
13 Integron_Finder integron 69515 74987 . + 1 ID=integron_01;integron_type=complete
24 Integron_Finder integron 25 12675 . + 1 ID=integron_01;integron_type=CALIN
25 Integron_Finder integron 19 9958 . + 1 ID=integron_01;integron_type=CALIN
27 Integron_Finder integron 6936 9536 . + 1 ID=integron_01;integron_type=complete
31 Integron_Finder integron 478 4564 . + 1 ID=integron_01;integron_type=CALIN
32 Integron_Finder integron 66 4604 . + 1 ID=integron_01;integron_type=CALIN
33 Integron_Finder integron 117 4047 . + 1 ID=integron_01;integron_type=CALIN
37 Integron_Finder integron -2 3108 . + 1 ID=integron_01;integron_type=CALIN
38 Integron_Finder integron 2 2804 . + 1 ID=integron_01;integron_type=CALIN
44 Integron_Finder integron 70 1709 . + 1 ID=integron_01;integron_type=CALIN
46 Integron_Finder integron -17 1603 . + 1 ID=integron_01;integron_type=CALIN
I would also need that you send me the results of integron finder
for this tool so I can check again the conversion to GFF module. It seems that the issue described in #116 is not yet finished.
So, I would need the files (for this specific genome) so I can first check the tool's results and make sure they are proper and then assess whether I can use other scripts for converting it to GFF to avoid this issue.
In the meantime, I have the following alternatives:
-r v3.2
Thank you for your quick replies. I am attaching the folder for specific genome. integron_finder_v31.zip I would also run the older version as I need the results. Thank you
Okay, Let me know how it goes. In the meantime I work on the issue of the current version.
Remember to use the circos configuration to avoid the circos error when running the earlier version. Hopefully it works using that version, if not, we can investigate.
Hi @JavariaAshraf ,
It seems that the problem is in the integron finder
tool itself. I would have to open an issue in the tool's github.
Can you send me the sequences of the contigs 37 and 46, which are the problematic ones from these genomes, so that I can investigate the issue with the tool's developers.
Hi @fmalmeida You are right. I have also installed the tool separately and it is giving the following error. integron_Finder_Error.txt Thanks.
Just for reference, I have opened the issue in their git. https://github.com/gem-pasteur/Integron_Finder/issues/114. Once it is fixed, I can bring the new version to the pipeline.
The only remedy I can do, for now, is releasing a new patch
release this week that allows one to skip the integron finder tool, with a param --skip_integron_finder
so that if this happens, one can run the rest.
this will be much helpful. Thank you.
Hi @JavariaAshraf ,
While I wait for the real fix in the integron finder tool, I have added the option for skipping INTEGRON_FINDER
and/or the CIRCOS
module.
Before I make a release, could you give it a try?
I would ask for you to try, first, only using the genome that cause the current problem, vibrio31
.
You could try to see if skipping these modules, the pipeline run successfully for this genome. If so, I can then wrap-up as a patch
release.
Suggested command line
nextflow run fmalmeida/bacannot \
-r 118-add-skip-integron-finder-param \
-latest \
-resume \
--skip_integron_finder \
--skip_circos \
# ... the rest of your normal input params
Depending on the result, I can merge it (or not).
Hi @fmalmeida, I have run three troubled sequences with the parameters you suggested above and the run completed smoothly. here's the screenshot.
Hi @JavariaAshraf ,
Thanks for the feedback. I am currently closing this issue then.
I have merged the code to the dev
branch (on PR #119), so, if you need to run the pipeline with these parameters you must refer to the dev
branch, with nextflow run fmalmeida/bacannot -r dev -latest
.
Finally, I opened a new ticket #120 so I remember to update the docker image with the new version of the integron finder tool once the devs release the fix.
Cheers, and thanks for reporting and using it.
Hi Fmalmeida, I am getting the following error as I try to resume my pipeline analysis.
Pulling fmalmeida/bacannot ... Remote origin did not advertise Ref for branch refs/heads/116-integron_finder_2gff-terminated-with-an-error. This Ref may not exist in the remote or may be hidden by permission settings.
The previous error was in circos, ` CIRCOS ERRORI tried to fix it with changing the 200 with 213 in housekeeping.config file, but the above error is not letting the program to resume. Kindly help.