biocom-uib / vpf-tools

Virus Protein Family tools
BSD 3-Clause "New" or "Revised" License
27 stars 7 forks source link

Output files empty #30

Closed Davegb closed 2 years ago

Davegb commented 2 years ago

Hi!

When I run VPF-tools, it looks like it does classify the sequences, but the output files are empty, they just have the headers. Just in case, I'm running Docker on Windows. This is the output I get from the command line:

> docker run --rm -it -v "vpf-data:/opt/vpf-tools/vpf-data" -v "$PWD/seqs:/opt/vpf-tools/input-sequences:ro" -v "$PWD/outputs:/opt/vpf-tools/outputs:rw" bielr/vpf-tools vpf-class -i input-sequences/SARS.fasta -o outputs/test-classified

VPF-Class data files are up-to-date.
searching hits
processed 28 sequences
processing hits
predicting memberships
loading VPF classifications
found 28 aggregated hit files
predicted 168 units

And the output files just have the following text (in the case of baltimore.tsv):

virus_name class_name membership_ratio virus_hit_score confidence_score

bielr commented 2 years ago

Hi,

Usually empty results mean no results, but your logs say otherwise. Could you re-run it with the --work-dir option and share the temporary files?

Davegb commented 2 years ago

Thanks! Here they go.

outputs.zip

bielr commented 2 years ago

Okay, so prodigal couldn't find any meaningful proteins.

The progress output here means that your input (28 sequences) was split into 28 different files, which multiplied by 6 taxonomic levels (baltimore, family, genus, host_domain, host_family, host_genus) implies 168 processing tasks. All of those are empty after the HMM search step (because Prodigal simply provided trash results).

I'd suggest trying --chunk-size 28, but even then I don't think you'll have any luck.

Davegb commented 2 years ago

Didn't any results, either. Anyway, thanks! That makes it more clear.