bcgsc / biobloom

Create Bloom filters for a given reference and then use it to categorize sequences
http://www.bcgsc.ca/platform/bioinfo/software/biobloomtools
GNU General Public License v3.0
76 stars 15 forks source link

Biobloom categorizer dies in a misterious way #86

Closed KristinaGagalova closed 11 months ago

KristinaGagalova commented 11 months ago

Hi

I am running biobloomcategorizer with multiple filters for contamination screening. This is the error I am having

Command error:
  Min score threshold: 0.15
  Starting to Load Filters.
  Loaded Filter: Univec
  Loaded Filter: taxid27482
  Loaded Filter: taxid4513
  Loaded Filter: plastid.1.rna
  error: `biobloom-filters/taxid4890.bf': Success

FYI - taxid4890 is the largest filter I have.

This is the command:

biobloomcategorizer -p reads    -t 32         -e         -i         -f "Univec.bf taxid27482.bf taxid4513.bf plastid.1.rna.bftaxid4890.bf taxid30262.bf taxid6946.bf" read_R1.fastq.gz read_R2.fastq.gz

BBT specs

biobloomtools=2.3.5

I assigned quite a lot of memory on slurm

CPU Utilized: 00:00:23
CPU Efficiency: 0.38% of 01:40:16 core-walltime
Job Wall-clock time: 00:00:47
Memory Utilized: 97.75 GB
Memory Efficiency: 13.96% of 700.00 GB

Do you have any idea what could that be? Thank you in advance

lcoombe commented 11 months ago

Hi Kristina,

Not sure if it's just how you copied your command but I see that there isn't a space between what I assume is two different Bloom filters?

plastid.1.rna.bftaxid4890.bf
KristinaGagalova commented 11 months ago

Hi @lcoombe That's not the issue, it's just a typo while removing some metadata from the script. I am using a nextflow command which formats everything properly. I tried to run the command with only the troublesome file and it gave me the same error

Command error:
  Min score threshold: 0.15
  Starting to Load Filters.
  error: `taxid4890.bf': Success
lcoombe commented 11 months ago

Thanks for the troubleshooting - so are you able to get a successful run with the other Bloom filters (even for a subset of the reads)?

If so, can you take a double check on the standard error/out from making that Bloom filter (feel free to share if it's helpful), and even try re-generating it again? I haven't seen that issue before, but if categorizer works for other BFs, but not that particular one, it would make me suspicious that something went awry in the maker stage building that BF.

If you haven't already, you could also check the text file that accompanies each BF (would be taxid4890.txt in the same directory where taxid4890.bf is), to check that the FPR, entries, etc. seems reasonable.

KristinaGagalova commented 11 months ago

Hi @lcoombe I rerun the biobloom make and now it's running. Even if I had both the .bf and the *txt files, looks like the filter was not generated properly for some reason, possibly some file system issues. Thank you for the suggestions! I am closing the ticket

lcoombe commented 11 months ago

Great, glad to hear it's working now!