OndrejSladky / kmercamel

KmerCamel🐫 provides implementations of several algorithms for efficiently representing a set of k-mers as a masked superstring.
MIT License
11 stars 2 forks source link

Add general support for parsing .gz files #44

Closed karel-brinda closed 1 week ago

karel-brinda commented 10 months ago

It's ok but should be documented how to use, can be solved by process substitution:

./kmercamel -k 13 -p <(gzcat node.fa.gz)
karel-brinda commented 10 months ago

Btw. if kseq is used for parsing fasta files, this imho supports gzip, so don't understand the error messages:

$ ./kmercamel -k 13 -p node.fa.gz
Path 'node.fa.gz' contains no k-mers.
KmerCamel v0.2
Accepted arguments:
  -p path_to_fasta - required; valid path to fasta file
  -k k_value       - required; integer value for k
  -a algorithm     - the algorithm to be run [global (default), globalAC, local, localAC, streaming]
  -o output_path   - if not specified, the output is printed to stdout
  -d d_value       - integer value for d_max; default 5
  -c               - treat k-mer and its reverse complement as equal
  -h               - print help
Example usage:       ./kmers -p path_to_fasta -k 13 -d 5 -a global
Possible algorithms: global globalAC local localAC streaming
OndrejSladky commented 10 months ago

Btw. if kseq is used for parsing fasta files, this imho supports gzip, so don't understand the error messages:

$ ./kmercamel -k 13 -p node.fa.gz
Path 'node.fa.gz' contains no k-mers.
KmerCamel v0.2
Accepted arguments:
  -p path_to_fasta - required; valid path to fasta file
  -k k_value       - required; integer value for k
  -a algorithm     - the algorithm to be run [global (default), globalAC, local, localAC, streaming]
  -o output_path   - if not specified, the output is printed to stdout
  -d d_value       - integer value for d_max; default 5
  -c               - treat k-mer and its reverse complement as equal
  -h               - print help
Example usage:       ./kmers -p path_to_fasta -k 13 -d 5 -a global
Possible algorithms: global globalAC local localAC streaming

At this point, global does not use kseq as I found it easier to implement it efficiently without kseq than try to find an usage of kseq that won't consume more memory than my current approach.

OndrejSladky commented 10 months ago

It's ok but should be documented how to use, can be solved by process substitution:

./kmercamel -k 13 -p <(gzcat node.fa.gz)

Added to README