DerrickWood / kraken2

The second version of the Kraken taxonomic sequence classification system
MIT License
687 stars 266 forks source link

Masking low-complexity regions of downloaded library... #777

Closed Duyh814 closed 6 months ago

Duyh814 commented 7 months ago

Hi. when i use kraken2-build , I choose --no-masking. Iwonder if there are other ways to do masking later, because downloading and masking at the same time is too slow. Is it possible to do the masking separately later? I want to know if it is possible to select "--no-masking" when using kraken2-build, and then use other commands separately for masking later, would that achieve the same effect without needing to download and mask at the same time?

tdfy commented 7 months ago

curious about this myself, I had my build time out. Would prefer not to repeat file transfer and start w/ masking.

jenniferlu717 commented 6 months ago

You can run the mask_low_complexity.sh script under the scripts/ folder in the repository

tdfy commented 6 months ago

Thank you @jenniferlu717. Can you comment on the script's usage? I cd'd to the db directory (--db in kraken2-build ) and evoked the mask_low_complexity.sh without success.

/mask_low_complexity.sh: line 14: $1: unbound variable
jenniferlu717 commented 6 months ago

Did you specify the files to run the script on? The script should take the filename as the first argument

tdfy commented 6 months ago

Now passing the .fna as the first argument.

cd /Kraken_db/microbe_db/library/bacteria

~/kraken/kraken2/mask_low_complexity.sh  library.fna

I receive this error:

~/kraken/kraken2/mask_low_complexity.sh: line 17: KRAKEN2_PROTEIN_DB: unbound variable
tdfy commented 5 months ago

Did you specify the files to run the script on? The script should take the filename as the first argument

Hey Jennifer, wondering if you had a chance to review? Thanks.