davidecenzato / optimalBWT

A tool for computing the optimalBWT of string collections
8 stars 1 forks source link

optimalBWT

optimalBWT is a tool that computes the optimal BWT using either a SAIS- or BCR-based algorithm.

Usage

usage: optimalBWT.py [-h] [-a ALGORITHM] [-f] [-q] [-v] [-b BUFFER] input output

optimal BWT is a tool that computes the optimal BWT of string collections.

positional arguments:
  input                 input file name
  output                output file name

options:
  -h, --help            show this help message and exit
  -a ALGORITHM, --algorithm ALGORITHM
                        select which algorithm to use ((sais | bcr) def. sais)
  -f, --fasta           take in input a fasta file (sais only, def. True)
  -q, --fastq           take in input a fastq file (sais only, def. False)
  -v, --verbose         verbose (def. False)
  -b BUFFER, --buffer BUFFER
                        set memory buffer size in MB (BCR only, def. 10)

When using the SAIS-based algorithm you can choose your input format between fasta and fastq format. When using the BCR-based algorithm you can choose the size of the input buffer for the algorithm permuting the characters in the SAP-intervals.

Requirements

The optimalBWT tool requires

Example

Download and Compile

git clone https://github.com/davidecenzato/optimalBWT.git
cd optimalBWT
git submodule update --init --recursive

make

Run on Example Data

// Construct the optimal BWT of fasta file using the SAIS-based algorithm
python3 optimalBWT.py input.fasta output --algorithm sais --fasta --verbose 

// Construct the optimal BWT of fasta file using the BCR-based algorithm
python3 optimalBWT.py input.fasta output --algorithm bcr --verbose -b 10

External resources

Authors

Theoretical results:

Experimental results:

Reference and citation

[1] Davide Cenzato, Veronica Guerrini, Zsuzsanna Lipták, Giovanna Rosone: Computing the optimal BWT of very large string collections. DCC 2023: 71-80 (go to the paper)

If you use optimalBWT in an academic setting, please cite this work as follows:

optimalBWT

@inproceedings{CenzatoGLR23,
  author       = {Davide Cenzato and
                  Veronica Guerrini and
                  Zsuzsanna Lipt{\'{a}}k and
                  Giovanna Rosone},
  title        = {Computing the optimal {BWT} of very large string collections},
  booktitle    = {In Proc. of the 33rd Data Compression Conference, {DCC} 2023, 2023},
  pages        = {71--80},
  year         = {2023},
  doi          = {10.1109/DCC55655.2023.00015}
}