bcgsc / biobloom

Create Bloom filters for a given reference and then use it to categorize sequences
http://www.bcgsc.ca/platform/bioinfo/software/biobloomtools
GNU General Public License v3.0
76 stars 15 forks source link

Need Filter File #87

Closed drhoads closed 8 months ago

drhoads commented 8 months ago

Need Filter File (-f)

Working in WSL2Ubuntu trying to filter paired end data from bacterial genome where there is a contaminant. I have two reference genomes 1638 and 1715, with the contaminated being 1637. I used biobloommaker to create 2 .bf:

biobloommaker -p 1638filter -o biobloom ../StrainNameFas/1638.fna biobloommaker -p 1715filter -o biobloom ../StrainNameFas/1715.fna

then when I run the categorizer I get an error biobloomcategorizer -e -p biobloom/1637 –f "biobloom/1638filter.bf biobloom/1715filter.bf" 1637_S141_R1_001_ptrim.fq 1637_S141_R2_001_ptrim.fq

**Usage of paired end mode: BioBloomCategorizer [OPTION]... -f "[FILTER1]..." [FILEPAIR1] [FILEPAIR2] or BioBloomCategorizer [OPTION]... -f "[FILTER1]..." [SMARTPAIR]

Error: Need Filter File (-f) Try '--help' for more information.**

Can't figure out why it is not reading the filter files. I have even tried running from inside the biobloom folder:

drhoads@ARSC-A-G4LJXP3:/mnt/f/DNAwork/Scohnii/genomes/DoNotUse/biobloom$ biobloomcategorizer -e -p 1637 –f "16 38filter.bf 1715filter.bf" ../1637_S141_R1_001_ptrim.fq ../1637_S141_R2_001_ptrim.fq Usage of paired end mode: BioBloomCategorizer [OPTION]... -f "[FILTER1]..." [FILEPAIR1] [FILEPAIR2] or BioBloomCategorizer [OPTION]... -f "[FILTER1]..." [SMARTPAIR]

Error: Need Filter File (-f) Try '--help' for more information. (MultiQC) drhoads@ARSC-A-G4LJXP3:/mnt/f/DNAwork/Scohnii/genomes/DoNotUse/biobloom$

The two files are definitely in the biobloom folder along with their .txt files

lcoombe commented 8 months ago

Hi @drhoads,

What version of BBT are you running?

drhoads commented 8 months ago

Just installed using conda this AM:

(MultiQC) @.***:/mnt/f/DNAwork/Scohnii/genomes/DoNotUse/biobloom$ biobloomcategorizer --version biobloomcategorizer (BIOBLOOMTOOLS) 2.3.5-1-gfa70-dirty Written by Justin Chu.

Copyright 2013 Canada's Michael Smith Genome Science Centre

@.***

From: Lauren Coombe @.> Sent: Monday, March 4, 2024 9:41 AM To: bcgsc/biobloom @.> Cc: Douglas Duane Rhoads @.>; Mention @.> Subject: Re: [bcgsc/biobloom] Need Filter File (Issue #87)

Hi @drhoadshttps://github.com/drhoads,

What version of BBT are you running?

- Reply to this email directly, view it on GitHubhttps://github.com/bcgsc/biobloom/issues/87#issuecomment-1976871391, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AIX22VT2QGFJZI2UHYVZJOTYWSIYXAVCNFSM6AAAAABEFGV5YWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSNZWHA3TCMZZGE. You are receiving this because you were mentioned.Message ID: @.**@.>>

lcoombe commented 8 months ago

Ok great!

Could you do one small test for me - could you test to see if BBT recognizes your files if you just specify one Bloom filter? (ie. then don't have to use quotes).

Ex:

biobloomcategorizer -e -p 1637 –f 1638filter.bf  ../1637_S141_R1_001_ptrim.fq ../1637_S141_R2_001_ptrim.fq
drhoads commented 8 months ago

Sorry was away for 5 hours, and just tried it with just one filter and no quotes. Same issue.

(MultiQC) @.***:/mnt/f/DNAwork/Scohnii/genomes/DoNotUse/biobloom$ biobloomcategorizer -e -p 1637 -f 1638filter.bf ../1637_S141_R1_001_ptrim.fq ../1637_S141_R2_001_ptrim.fq Usage of paired end mode: BioBloomCategorizer [OPTION]... -f "[FILTER1]..." [FILEPAIR1] [FILEPAIR2] or BioBloomCategorizer [OPTION]... -f "[FILTER1]..." [SMARTPAIR]

Error: Need Filter File (-f) Try '--help' for more information.

FYI the bf file is 3.16 Mb @.***

From: Douglas Duane Rhoads Sent: Monday, March 4, 2024 9:45 AM To: bcgsc/biobloom @.>; bcgsc/biobloom @.> Cc: Mention @.***> Subject: RE: [bcgsc/biobloom] Need Filter File (Issue #87)

Just installed using conda this AM:

(MultiQC) @.***:/mnt/f/DNAwork/Scohnii/genomes/DoNotUse/biobloom$ biobloomcategorizer --version biobloomcategorizer (BIOBLOOMTOOLS) 2.3.5-1-gfa70-dirty Written by Justin Chu.

Copyright 2013 Canada's Michael Smith Genome Science Centre

@.***

From: Lauren Coombe @.**@.>> Sent: Monday, March 4, 2024 9:41 AM To: bcgsc/biobloom @.**@.>> Cc: Douglas Duane Rhoads @.**@.>>; Mention @.**@.>> Subject: Re: [bcgsc/biobloom] Need Filter File (Issue #87)

Hi @drhoadshttps://github.com/drhoads,

What version of BBT are you running?

- Reply to this email directly, view it on GitHubhttps://github.com/bcgsc/biobloom/issues/87#issuecomment-1976871391, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AIX22VT2QGFJZI2UHYVZJOTYWSIYXAVCNFSM6AAAAABEFGV5YWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSNZWHA3TCMZZGE. You are receiving this because you were mentioned.Message ID: @.**@.>>

lcoombe commented 8 months ago

Hi @drhoads,

Thanks for the update! That is very strange, indeed..

So, this section of code seems to be triggered, despite you specifying a valid path to a BF: https://github.com/bcgsc/biobloom/blob/master/BioBloomCategorizer/BioBloomCategorizer.cpp#L382-L385

Would you mind sharing your full standard error and standard out for the biobloommaker commands, as well as the contents of the *txt files that were generated along with the Bloom filters? I just want to double check that those steps completed as expected.

drhoads commented 8 months ago

(MultiQC) drhoads@ARSC-A-G4LJXP3:/mnt/f/DNAwork/Scohnii/genomes/DoNotUse$ biobloommaker -p 1638filter -o biobloom ../StrainNameFas/1638.fna Opening File ../StrainNameFas/1638.fna Allocating 26055104 bits of space for filter and will output filter this size (plus header) Approximated (due to false positives) total unique k-mers in reference files 2567799 Writing a 3256888 byte filter to biobloom/1638filter.bf on disk. Filter Creation Complete. 1638filter.zip

lcoombe commented 8 months ago

Thanks for all that info - I can confirm that on my end, if I use that exact filter, it loads the Bloom filter fine:

(btl) [lcoombe@hpce705 tmp]$ biobloomcategorizer -f 1638filter.bf ../DRR021766_1.fastq.gz 
Min score threshold: 0.15
Starting to Load Filters.
Loaded Filter: 1638filter
Filter Loading Complete.

So, I'm wondering if it is something related to WSL2Ubuntu that BBT is having an issue with..

@jwcodee / @JustinChu / @parham-k - Do you have any ideas as to why the constructed BFs are not being recognized properly?

drhoads commented 8 months ago

Could it be that I am installing in a conda env for MultiQC? I came across BBT in the listing of tools for use in MultiQC, and this is my first foray with MultiQC. In the AM I will try installing BBT in its own env and try again. Will report back what I find out.

Get Outlook for Androidhttps://aka.ms/AAb9ysg


From: Lauren Coombe @.> Sent: Monday, March 4, 2024 3:56:52 PM To: bcgsc/biobloom @.> Cc: Douglas Duane Rhoads @.>; Mention @.> Subject: Re: [bcgsc/biobloom] Need Filter File (Issue #87)

Thanks for all that info - I can confirm that on my end, if I use that exact filter, it loads the Bloom filter fine:

(btl) @.*** tmp]$ biobloomcategorizer -f 1638filter.bf ../DRR021766_1.fastq.gz Min score threshold: 0.15 Starting to Load Filters. Loaded Filter: 1638filter Filter Loading Complete.

So, I'm wondering if it is something related to WSL2Ubuntu that BBT is having an issue with..

@jwcodeehttps://github.com/jwcodee / @JustinChuhttps://github.com/JustinChu / @parham-khttps://github.com/parham-k - Do you have any ideas as to why the constructed BFs are not being recognized properly?

— Reply to this email directly, view it on GitHubhttps://github.com/bcgsc/biobloom/issues/87#issuecomment-1977531776, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AIX22VXDR7QFNLSGNT6NQ5TYWTU2JAVCNFSM6AAAAABEFGV5YWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSNZXGUZTCNZXGY. You are receiving this because you were mentioned.Message ID: @.***>

drhoads commented 8 months ago

Tried installing biobloomtools in its own conda env and the error happened the same:

(BioBloomTool) drhoads@ARSC-A-G4LJXP3:/mnt/f/DNAwork/Scohnii/genomes/DoNotUse/biobloom$ biobloomcategorizer -e -p 1637 –f 1638filter.bf ../1637_S141_R1_001_ptrim.fq ../1637_S141_R2_001_ptrim.fq Usage of paired end mode: BioBloomCategorizer [OPTION]... -f "[FILTER1]..." [FILEPAIR1] [FILEPAIR2] or BioBloomCategorizer [OPTION]... -f "[FILTER1]..." [SMARTPAIR]

Error: Need Filter File (-f) Try '--help' for more information.

drhoads commented 8 months ago

I was able to install and run all the commands on our HPC system, so what ever it is must either relate to WSL2 or Ubuntu. I have another machine in my lab with WSL2-Ubuntu, and a LinuxMint machine. If I get a chance on Thursday I will see what happens on them. Sure is strange, and I keep looking for a mistyped word or something. Usually it is some small insidious thing. For me it is usually a / vs \ since I move between Windows and Linux, or single vs double hyphen.

lcoombe commented 8 months ago

Oh great, thank you for that update! My guess is that it's the WSL2 environment - we have seen that before with other tools that we get can rather cryptic errors without straightforward solutions. We generally work on Centos machines, but have run BBT on ubuntu before without an issue. Probably, if possible, working off the HPC system will be your best bet! The WSL2 things are rather hard for us to troubleshoot, since we don't have access to that environment!

JustinChu commented 8 months ago

Hey this is a bit silly but the original command: biobloomcategorizer -e -p biobloom/1637 –f "biobloom/1638filter.bf biobloom/1715filter.bf" 1637_S141_R1_001_ptrim.fq 1637_S141_R2_001_ptrim.fq is using f but is specific with the character which is 150 in ascii rather than - which is 45. I'm not sure how this happened for you but I know for a fact Ubuntu doesn't autoconvert to -.

Can you double check this is working: biobloomcategorizer -e -p biobloom/1637 -f "biobloom/1638filter.bf biobloom/1715filter.bf" 1637_S141_R1_001_ptrim.fq 1637_S141_R2_001_ptrim.fq

lcoombe commented 8 months ago

Huh good catch - thanks for noticing that @JustinChu!

drhoads commented 8 months ago

Yep, just got back from the lab where I installed on my other WSL2Ubuntu and it ran fine but I had to type in the command. On my home machine I had been copy-pasting from a journal I keep of all my work. I deleted the -f and typed it in and it ran just fine. If you are wondering where I got the chr(150) it was from your github instructions because I used the copy command and then modified to suit my needs. To confirm this I went back to your github page (https://github.com/bcgsc/biobloom) under section 3. where it says: There are some advanced options open can use outlined in section 5. Notable option one can use is the paired end mode -e:

./biobloomcategorizer -e –p /output/prefix –f "filter1.bf filter2.bf filter3.bf" inputReads1_1.fq inputreads1_2.fq -e will require that both reads match when making the call about what reference they belong in.

Then I copied the "-f" from that command and put it into my command that just worked, and it reverted to throwing the error. So, you might want to check that website and that impostor hyphen. On a delightful note the filtering cleaned out the contaminant in my NGS data and I got a great assembly. Thanks for a great tool that will be part of my arsenal from here on out.

JustinChu commented 8 months ago

Oh yeah that's a good point I noticed we have some of those issues in the readme. I'm honestly not sure how that happened. I'll replace them all. Thanks.