EBI-Metagenomics / emg-viral-pipeline

VIRify: detection of phages and eukaryotic viruses from metagenomic and metatranscriptomic assemblies
Apache License 2.0
118 stars 16 forks source link

Question regarding GFF file generation #127

Open fischer-hub opened 1 month ago

fischer-hub commented 1 month ago

Hey, thanks for this nice pipeline!

I'm running the pipeline with the --list argument providing a sample sheet to it to run on ~100 samples. But for all the viral sequences that are identified by the pipeline I only get one GFF file in the output directory of the first sample from the sample sheet. I think this is cause by this line of code and I was just wondering what the reasoning behind taking only runnning the GFF script on the first contig file as a reference sequence is.

I get that in the case of running the pipeline on a single sample this doesn't make a difference, but when dealing with multiple samples wouldn't it make sense to generate a GFF file for every sample where viral sequences have been found? Just wondering if I'm missing something here!

Thanks in advance!

mberacochea commented 1 month ago

Hey @fischer-hub

That looks like a bug to me, we usually run the pipeline one assembly at the time. So I think it's just an unhappy coincidence. I'll change it and will test the pipeline.

Cheers

fischer-hub commented 1 month ago

Thanks again!