HudsonAlpha / fmlrc2

Apache License 2.0
43 stars 5 forks source link

No error if '-k=21,33,41,59,79' cannot get parsed #15

Closed mmokrejs closed 3 years ago

mmokrejs commented 3 years ago

Hi, luckily I realized that the output file is exactly same with my input by comparing checksums. It appears to me the -k values were ignored, with I assume a spurious message output speaking about k-mer sizes 21 and 59 only:

fmlrc2 -t 16 -C 10 -k=21,33,41,59,79 comp_msbwt.npy input.fastq.gz output.fasta
[2021-10-13T20:22:51Z INFO  fmlrc2] Input parameters (required):
[2021-10-13T20:22:51Z INFO  fmlrc2]     BWT: "comp_msbwt.npy"
[2021-10-13T20:22:51Z INFO  fmlrc2]     Input reads: "input.fastq.gz"
[2021-10-13T20:22:51Z INFO  fmlrc2]     Output corrected reads: "output.fasta"
[2021-10-13T20:22:51Z INFO  fmlrc2] Execution Parameters:
[2021-10-13T20:22:51Z INFO  fmlrc2]     verbose: false
[2021-10-13T20:22:51Z INFO  fmlrc2]     threads: 16
[2021-10-13T20:22:51Z INFO  fmlrc2]     cache size: 10
[2021-10-13T20:22:51Z INFO  fmlrc2] Correction Parameters:
[2021-10-13T20:22:51Z INFO  fmlrc2]     reads to correct: [0, 18446744073709551615)
[2021-10-13T20:22:51Z INFO  fmlrc2]     k-mer sizes: [21, 59]
[2021-10-13T20:22:51Z INFO  fmlrc2]     abs. mininimum count: 5
[2021-10-13T20:22:51Z INFO  fmlrc2]     dyn. minimimum fraction: 0.1
[2021-10-13T20:22:51Z INFO  fmlrc2]     branching factor: 4
[2021-10-13T20:22:51Z INFO  fmlrc::bv_bwt] Loading BWT with 1 compressed values
[2021-10-13T20:22:51Z INFO  fmlrc::bv_bwt] Loaded BWT with symbol counts: [1, 0, 0, 0, 0, 0]
[2021-10-13T20:22:51Z INFO  fmlrc::bv_bwt] Allocating binary vectors...
[2021-10-13T20:22:51Z INFO  fmlrc::bv_bwt] Calculating binary vectors...
[2021-10-13T20:22:51Z INFO  fmlrc::bv_bwt] Constructing FM-indices...
[2021-10-13T20:22:51Z INFO  fmlrc::bv_bwt] Building 10-mer cache...
[2021-10-13T20:22:52Z INFO  fmlrc::bv_bwt] Finished BWT initialization.
[2021-10-13T20:22:52Z INFO  fmlrc2] Starting read correction processes...
[2021-10-13T20:23:04Z INFO  fmlrc2] Processed 10000 reads...

By reading https://github.com/HudsonAlpha/rust-fmlrc/issues/7#issuecomment-761107717 I see the syntax shuld be different. Please improve the parsing or at least, improve the README.md and give an example how to use multiple k-mer sizes.

Could fmlrc2 output a summary how many correction changes it did per dataset?

mmokrejs commented 3 years ago

Well, the primary error is maybe that the index creation failed

+ grep -v '^>' L319_301_S9_L003.trimmomatic.tadpole.k62.shave.rinse.pairs.fasta
+ sort --parallel=16
+ ropebwt2 -LR
+ tr NT TN
+ tr NT TN
+ fmlrc2-convert comp_msbwt.npy
[2021-10-13T16:24:00Z INFO  fmlrc2_convert] Input parameters (required):
[2021-10-13T16:24:00Z INFO  fmlrc2_convert]     Input BWT: "stdin"
[2021-10-13T16:24:00Z INFO  fmlrc2_convert]     Output BWT: "comp_msbwt.npy"
sort: write failed: /tmp/sortJzvz0h: No space left on device
[M::main_ropebwt2] inserted 1 symbols in 0.002 sec, 0.001 CPU sec
[M::main_ropebwt2] constructed FM-index in 1940.004 sec, 0.001 CPU sec
[M::main_ropebwt2] symbol counts: ($, A, C, G, T, N) = (1, 0, 0, 0, 0, 0)
[M::main] Version: r187
[M::main] CMD: ropebwt2 -LR
[M::main] Real time: 1940.007 sec; CPU: 0.006 sec
[2021-10-13T16:56:20Z INFO  fmlrc::bwt_converter] Converted BWT with symbol counts: [1, 0, 0, 0, 0, 0]
[2021-10-13T16:56:20Z INFO  fmlrc::bwt_converter] RLE-BWT byte length: 1
[2021-10-13T16:56:20Z INFO  fmlrc2_convert] RLE-BWT conversion complete.
grep: write error: Broken pipe

giving me comp_msbwt.npy only 97 bytes long.

Still I thing parsing the commandline should raise an error.

mmokrejs commented 3 years ago
fmlrc2 -t 16 -C 10 -k 21 33 41 59 79 comp_msbwt.npy input.fastq.gz output.fasta
...
error: The following required arguments were not provided:
    <COMP_MSBWT.NPY>
    <LONG_READS.FA>
    <CORRECTED_READS.FA>
holtjma commented 3 years ago

Yes, as stated in the help menu, the default method would be:

-k, --K <kmer_sizes>...                k-mer sizes for correction, can be specified multiple times (default: "-k 21 59")
rjsorr commented 2 years ago

Please provide an example other than default. -k 21 59 79 127 does not work for example and follwing the instructions it should?