HudsonAlpha / fmlrc2

Apache License 2.0
43 stars 5 forks source link

Invalid record while parsing long read file #11

Closed maepz closed 3 years ago

maepz commented 3 years ago

Hi, I have encountered this parsing error while running fmlrc2 on Nanopore libraries. Any ideas as to what might be causing it?

For improved speed, add 'usejni=t' to the command line of BBMap tools which support the use of the compiled jni C code. [2021-05-05T18:35:57Z INFO fmlrc2_convert] Input parameters (required): [2021-05-05T18:35:57Z INFO fmlrc2_convert] Input BWT: "stdin" [2021-05-05T18:35:57Z INFO fmlrc2_convert] Output BWT: "comp_msbwt.npy" [M::main_ropebwt2] inserted 10415295803 symbols in 545.050 sec, 1374.158 CPU sec [M::main_ropebwt2] inserted 10415295775 symbols in 628.359 sec, 1677.699 CPU sec [M::main_ropebwt2] inserted 10415295782 symbols in 727.891 sec, 1939.978 CPU sec [M::main_ropebwt2] inserted 10415295802 symbols in 736.565 sec, 2174.521 CPU sec [M::main_ropebwt2] inserted 10415295738 symbols in 767.744 sec, 2288.696 CPU sec [M::main_ropebwt2] inserted 10415295753 symbols in 771.309 sec, 2371.800 CPU sec [M::main_ropebwt2] inserted 10415295755 symbols in 823.817 sec, 2400.644 CPU sec [M::main_ropebwt2] inserted 10415295759 symbols in 849.193 sec, 2514.707 CPU sec [M::main_ropebwt2] inserted 4322128096 symbols in 440.669 sec, 1265.274 CPU sec [M::main_ropebwt2] constructed FM-index in 11877.718 sec, 18296.507 CPU sec [M::main_ropebwt2] symbol counts: ($, A, C, G, T, N) = (583908828, 24996888600, 18626444702, 19128287424, 1463334, 24307501375) [M::main] Version: r187 [M::main] CMD: ropebwt2 -LR [M::main] Real time: 14595.897 sec; CPU: 19792.167 sec [2021-05-05T22:39:13Z INFO fmlrc::bwt_converter] Converted BWT with symbol counts: [583908828, 24996888600, 18626444702, 19128287424, 1463334, 24307501375] [2021-05-05T22:39:13Z INFO fmlrc::bwt_converter] RLE-BWT byte length: 8898824487 [2021-05-05T22:40:46Z INFO fmlrc2_convert] RLE-BWT conversion complete. [2021-05-05T22:40:46Z INFO fmlrc2] Input parameters (required): [2021-05-05T22:40:46Z INFO fmlrc2] BWT: "comp_msbwt.npy" [2021-05-05T22:40:46Z INFO fmlrc2] Input reads: "unmapped_R07B-5-V_N2f.fq.gz" [2021-05-05T22:40:46Z INFO fmlrc2] Output corrected reads: "tmp.nucl_R07B-5.fmlrc_out.fasta" [2021-05-05T22:40:46Z INFO fmlrc2] Execution Parameters: [2021-05-05T22:40:46Z INFO fmlrc2] verbose: false [2021-05-05T22:40:46Z INFO fmlrc2] threads: 16 [2021-05-05T22:40:46Z INFO fmlrc2] cache size: 8 [2021-05-05T22:40:46Z INFO fmlrc2] Correction Parameters: [2021-05-05T22:40:46Z INFO fmlrc2] reads to correct: [0, 18446744073709551615) [2021-05-05T22:40:46Z INFO fmlrc2] k-mer sizes: [21, 59] [2021-05-05T22:40:46Z INFO fmlrc2] abs. mininimum count: 5 [2021-05-05T22:40:46Z INFO fmlrc2] dyn. minimimum fraction: 0.1 [2021-05-05T22:40:46Z INFO fmlrc2] branching factor: 4 [2021-05-05T22:40:50Z INFO fmlrc::bv_bwt] Loading BWT with 8898824487 compressed values [2021-05-05T22:41:03Z INFO fmlrc::bv_bwt] Loaded BWT with symbol counts: [583908828, 24996888600, 18626444702, 19128287424, 1463334, 24307501375] [2021-05-05T22:41:03Z INFO fmlrc::bv_bwt] Allocating binary vectors... [2021-05-05T22:41:43Z INFO fmlrc::bv_bwt] Calculating binary vectors... [2021-05-05T22:46:52Z INFO fmlrc::bv_bwt] Constructing FM-indices... [2021-05-05T22:47:26Z INFO fmlrc::bv_bwt] Building 8-mer cache... [2021-05-05T22:47:26Z INFO fmlrc::bv_bwt] Finished BWT initialization. [2021-05-05T22:47:26Z INFO fmlrc2] Starting read correction processes... [2021-05-05T22:48:34Z INFO fmlrc2] Processed 10000 reads... [2021-05-05T22:50:39Z INFO fmlrc2] Processed 20000 reads... [2021-05-05T22:52:51Z INFO fmlrc2] Processed 30000 reads... [2021-05-05T22:55:13Z INFO fmlrc2] Processed 40000 reads... [2021-05-05T22:57:32Z INFO fmlrc2] Processed 50000 reads... [2021-05-05T22:59:52Z INFO fmlrc2] Processed 60000 reads... [2021-05-05T23:02:16Z INFO fmlrc2] Processed 70000 reads... [2021-05-05T23:04:36Z INFO fmlrc2] Processed 80000 reads... [2021-05-05T23:06:56Z INFO fmlrc2] Processed 90000 reads... [2021-05-05T23:08:46Z INFO fmlrc2] Processed 100000 reads... [2021-05-05T23:11:01Z INFO fmlrc2] Processed 110000 reads... [2021-05-05T23:13:13Z INFO fmlrc2] Processed 120000 reads... [2021-05-05T23:15:28Z INFO fmlrc2] Processed 130000 reads... [2021-05-05T23:17:46Z INFO fmlrc2] Processed 140000 reads... [2021-05-05T23:20:01Z INFO fmlrc2] Processed 150000 reads... [2021-05-05T23:22:13Z INFO fmlrc2] Processed 160000 reads... [2021-05-05T23:24:25Z INFO fmlrc2] Processed 170000 reads... [2021-05-05T23:26:20Z INFO fmlrc2] Processed 180000 reads... [2021-05-05T23:28:33Z INFO fmlrc2] Processed 190000 reads... [2021-05-05T23:30:44Z INFO fmlrc2] Processed 200000 reads... [2021-05-05T23:32:58Z INFO fmlrc2] Processed 210000 reads... [2021-05-05T23:35:13Z INFO fmlrc2] Processed 220000 reads... [2021-05-05T23:37:34Z INFO fmlrc2] Processed 230000 reads... [2021-05-05T23:39:57Z INFO fmlrc2] Processed 240000 reads... [2021-05-05T23:42:26Z INFO fmlrc2] Processed 250000 reads... [2021-05-05T23:44:55Z INFO fmlrc2] Processed 260000 reads... [2021-05-05T23:46:56Z INFO fmlrc2] Processed 270000 reads... [2021-05-05T23:49:21Z INFO fmlrc2] Processed 280000 reads... [2021-05-05T23:51:51Z INFO fmlrc2] Processed 290000 reads... [2021-05-05T23:54:17Z INFO fmlrc2] Processed 300000 reads... [2021-05-05T23:56:45Z INFO fmlrc2] Processed 310000 reads... [2021-05-05T23:59:15Z INFO fmlrc2] Processed 320000 reads... [2021-05-06T00:01:42Z INFO fmlrc2] Processed 330000 reads... [2021-05-06T00:04:13Z INFO fmlrc2] Processed 340000 reads... [2021-05-06T00:06:43Z INFO fmlrc2] Processed 350000 reads... [2021-05-06T00:08:49Z INFO fmlrc2] Processed 360000 reads... [2021-05-06T00:11:10Z INFO fmlrc2] Processed 370000 reads... [2021-05-06T00:13:28Z INFO fmlrc2] Processed 380000 reads... [2021-05-06T00:15:58Z INFO fmlrc2] Processed 390000 reads... [2021-05-06T00:18:32Z INFO fmlrc2] Processed 400000 reads... [2021-05-06T00:20:57Z INFO fmlrc2] Processed 410000 reads... [2021-05-06T00:23:22Z INFO fmlrc2] Processed 420000 reads... [2021-05-06T00:25:50Z INFO fmlrc2] Processed 430000 reads... [2021-05-06T00:28:01Z INFO fmlrc2] Processed 440000 reads... [2021-05-06T00:30:24Z INFO fmlrc2] Processed 450000 reads... [2021-05-06T00:32:53Z INFO fmlrc2] Processed 460000 reads... [2021-05-06T00:35:22Z INFO fmlrc2] Processed 470000 reads... [2021-05-06T00:37:59Z INFO fmlrc2] Processed 480000 reads... [2021-05-06T00:40:39Z INFO fmlrc2] Processed 490000 reads... [2021-05-06T00:43:18Z INFO fmlrc2] Processed 500000 reads... [2021-05-06T00:46:00Z INFO fmlrc2] Processed 510000 reads... [2021-05-06T00:48:24Z INFO fmlrc2] Processed 520000 reads... [2021-05-06T00:50:58Z INFO fmlrc2] Processed 530000 reads... [2021-05-06T00:53:36Z INFO fmlrc2] Processed 540000 reads... [2021-05-06T00:56:11Z INFO fmlrc2] Processed 550000 reads... [2021-05-06T00:58:45Z INFO fmlrc2] Processed 560000 reads... [2021-05-06T01:01:20Z INFO fmlrc2] Processed 570000 reads... [2021-05-06T01:04:01Z INFO fmlrc2] Processed 580000 reads... [2021-05-06T01:06:32Z INFO fmlrc2] Processed 590000 reads... [2021-05-06T01:09:07Z INFO fmlrc2] Processed 600000 reads... [2021-05-06T01:11:32Z INFO fmlrc2] Processed 610000 reads... [2021-05-06T01:14:05Z INFO fmlrc2] Processed 620000 reads... [2021-05-06T01:16:39Z INFO fmlrc2] Processed 630000 reads... [2021-05-06T01:19:10Z INFO fmlrc2] Processed 640000 reads... [2021-05-06T01:21:38Z INFO fmlrc2] Processed 650000 reads... [2021-05-06T01:24:08Z INFO fmlrc2] Processed 660000 reads... [2021-05-06T01:26:43Z INFO fmlrc2] Processed 670000 reads... [2021-05-06T01:29:17Z INFO fmlrc2] Processed 680000 reads... [2021-05-06T01:31:38Z INFO fmlrc2] Processed 690000 reads... [2021-05-06T01:34:17Z INFO fmlrc2] Processed 700000 reads... [2021-05-06T01:36:56Z INFO fmlrc2] Processed 710000 reads... [2021-05-06T01:39:32Z INFO fmlrc2] Processed 720000 reads... [2021-05-06T01:42:08Z INFO fmlrc2] Processed 730000 reads... [2021-05-06T01:44:40Z INFO fmlrc2] Processed 740000 reads... [2021-05-06T01:47:16Z INFO fmlrc2] Processed 750000 reads... [2021-05-06T01:49:45Z INFO fmlrc2] Processed 760000 reads... [2021-05-06T01:52:10Z INFO fmlrc2] Processed 770000 reads... [2021-05-06T01:54:44Z INFO fmlrc2] Processed 780000 reads... [2021-05-06T01:57:18Z INFO fmlrc2] Processed 790000 reads... [2021-05-06T01:59:50Z INFO fmlrc2] Processed 800000 reads... [2021-05-06T02:02:23Z INFO fmlrc2] Processed 810000 reads... [2021-05-06T02:04:57Z INFO fmlrc2] Processed 820000 reads... [2021-05-06T02:07:32Z INFO fmlrc2] Processed 830000 reads... [2021-05-06T02:09:49Z INFO fmlrc2] Processed 840000 reads... [2021-05-06T02:12:19Z INFO fmlrc2] Processed 850000 reads... [2021-05-06T02:14:47Z INFO fmlrc2] Processed 860000 reads... [2021-05-06T02:16:49Z INFO fmlrc2] Processed 870000 reads... [2021-05-06T02:18:56Z INFO fmlrc2] Processed 880000 reads... [2021-05-06T02:21:10Z INFO fmlrc2] Processed 890000 reads... [2021-05-06T02:23:31Z INFO fmlrc2] Processed 900000 reads... [2021-05-06T02:25:54Z INFO fmlrc2] Processed 910000 reads... [2021-05-06T02:28:16Z INFO fmlrc2] Processed 920000 reads... [2021-05-06T02:30:38Z INFO fmlrc2] Processed 930000 reads... [2021-05-06T02:33:07Z INFO fmlrc2] Processed 940000 reads... [2021-05-06T02:35:08Z INFO fmlrc2] Processed 950000 reads... [2021-05-06T02:37:19Z INFO fmlrc2] Processed 960000 reads... [2021-05-06T02:39:30Z INFO fmlrc2] Processed 970000 reads... [2021-05-06T02:41:38Z INFO fmlrc2] Processed 980000 reads... [2021-05-06T02:43:52Z INFO fmlrc2] Processed 990000 reads... [2021-05-06T02:46:07Z INFO fmlrc2] Processed 1000000 reads... [2021-05-06T02:48:21Z INFO fmlrc2] Processed 1010000 reads... [2021-05-06T02:50:34Z INFO fmlrc2] Processed 1020000 reads... [2021-05-06T02:52:53Z INFO fmlrc2] Processed 1030000 reads... [2021-05-06T02:55:08Z INFO fmlrc2] Processed 1040000 reads... [2021-05-06T02:57:02Z INFO fmlrc2] Processed 1050000 reads... [2021-05-06T02:59:17Z INFO fmlrc2] Processed 1060000 reads... [2021-05-06T03:01:32Z INFO fmlrc2] Processed 1070000 reads... [2021-05-06T03:03:48Z INFO fmlrc2] Processed 1080000 reads... [2021-05-06T03:06:05Z INFO fmlrc2] Processed 1090000 reads... [2021-05-06T03:08:21Z INFO fmlrc2] Processed 1100000 reads... [2021-05-06T03:10:38Z INFO fmlrc2] Processed 1110000 reads... [2021-05-06T03:12:52Z INFO fmlrc2] Processed 1120000 reads... [2021-05-06T03:15:12Z INFO fmlrc2] Processed 1130000 reads... [2021-05-06T03:17:12Z INFO fmlrc2] Processed 1140000 reads... [2021-05-06T03:19:12Z INFO fmlrc2] Processed 1150000 reads... [2021-05-06T03:21:23Z INFO fmlrc2] Processed 1160000 reads... [2021-05-06T03:23:36Z INFO fmlrc2] Processed 1170000 reads... [2021-05-06T03:25:52Z INFO fmlrc2] Processed 1180000 reads... [2021-05-06T03:28:06Z INFO fmlrc2] Processed 1190000 reads... [2021-05-06T03:30:13Z INFO fmlrc2] Processed 1200000 reads... [2021-05-06T03:32:24Z INFO fmlrc2] Processed 1210000 reads... [2021-05-06T03:34:38Z INFO fmlrc2] Processed 1220000 reads... [2021-05-06T03:36:54Z INFO fmlrc2] Processed 1230000 reads... [2021-05-06T03:38:53Z INFO fmlrc2] Processed 1240000 reads... [2021-05-06T03:40:56Z INFO fmlrc2] Processed 1250000 reads... [2021-05-06T03:43:01Z INFO fmlrc2] Processed 1260000 reads... [2021-05-06T03:45:03Z INFO fmlrc2] Processed 1270000 reads... [2021-05-06T03:47:11Z INFO fmlrc2] Processed 1280000 reads... [2021-05-06T03:49:15Z INFO fmlrc2] Processed 1290000 reads... [2021-05-06T03:51:19Z INFO fmlrc2] Processed 1300000 reads... [2021-05-06T03:53:20Z INFO fmlrc2] Processed 1310000 reads... [2021-05-06T03:55:26Z INFO fmlrc2] Processed 1320000 reads... [2021-05-06T03:57:28Z INFO fmlrc2] Processed 1330000 reads... [2021-05-06T03:59:24Z INFO fmlrc2] Processed 1340000 reads... [2021-05-06T04:01:32Z INFO fmlrc2] Processed 1350000 reads... [2021-05-06T04:03:38Z INFO fmlrc2] Processed 1360000 reads... [2021-05-06T04:05:47Z INFO fmlrc2] Processed 1370000 reads... [2021-05-06T04:07:50Z INFO fmlrc2] Processed 1380000 reads... [2021-05-06T04:09:51Z INFO fmlrc2] Processed 1390000 reads... [2021-05-06T04:12:04Z INFO fmlrc2] Processed 1400000 reads... [2021-05-06T04:14:09Z INFO fmlrc2] Processed 1410000 reads... [2021-05-06T04:16:17Z INFO fmlrc2] Processed 1420000 reads... [2021-05-06T04:18:21Z INFO fmlrc2] Processed 1430000 reads... [2021-05-06T04:20:12Z INFO fmlrc2] Processed 1440000 reads... [2021-05-06T04:22:10Z INFO fmlrc2] Processed 1450000 reads... [2021-05-06T04:24:16Z INFO fmlrc2] Processed 1460000 reads... [2021-05-06T04:26:15Z INFO fmlrc2] Processed 1470000 reads... [2021-05-06T04:28:17Z INFO fmlrc2] Processed 1480000 reads... [2021-05-06T04:30:14Z INFO fmlrc2] Processed 1490000 reads... [2021-05-06T04:32:12Z INFO fmlrc2] Processed 1500000 reads... [2021-05-06T04:34:08Z INFO fmlrc2] Processed 1510000 reads... [2021-05-06T04:36:06Z INFO fmlrc2] Processed 1520000 reads... [2021-05-06T04:38:04Z INFO fmlrc2] Processed 1530000 reads... [2021-05-06T04:39:54Z INFO fmlrc2] Processed 1540000 reads... [2021-05-06T04:40:18Z ERROR fmlrc2] Invalid record while parsing long read file: ParseError { msg: "corrupt deflate stream", kind: Io, position: ErrorPosition { line: 0, id: None }, format: None } java -ea -Xmx800m -Xms800m -cp /cvmfs/soft.computecanada.ca/easybuild/software/2020/Core/bbmap/38.86/current/ jgi.ReformatReads -Xmx800m usejni=t ow=t qin=33 reads= in=unmapped_R07B-5-V_N2f.fq.gz out=tmp.R07B-5_raw.fq Executing jgi.ReformatReads [-Xmx800m, usejni=t, ow=t, qin=33, reads=, in=unmapped_R07B-5-V_N2f.fq.gz, out=tmp.R07B-5_raw.fq]

crisG: Warning - created a read stream for 0 reads. Exception in thread "main" java.lang.AssertionError at stream.ConcurrentGenericReadInputStream.(ConcurrentGenericReadInputStream.java:136) at stream.ConcurrentReadInputStream.getReadInputStream(ConcurrentReadInputStream.java:123) at stream.ConcurrentReadInputStream.getReadInputStream(ConcurrentReadInputStream.java:69) at jgi.ReformatReads.process(ReformatReads.java:480) at jgi.ReformatReads.main(ReformatReads.java:52)

holtjma commented 3 years ago

Hello,

Previously when I've seen the "corrupt deflate stream" error, it was caused by a malformed input FASTQ file. For example, you could take a gzipped FASTQ file and cut it at an arbitrary location which would create an invalid gzip file.

I'm not exactly sure what command you're using to generate your FASTQ reads, but it seems to be coming in via a stdin pipe, presumably from whatever program is generating this error at the end:

crisG: Warning - created a read stream for 0 reads.
Exception in thread "main" java.lang.AssertionError
at stream.ConcurrentGenericReadInputStream.(ConcurrentGenericReadInputStream.java:136)
at stream.ConcurrentReadInputStream.getReadInputStream(ConcurrentReadInputStream.java:123)
at stream.ConcurrentReadInputStream.getReadInputStream(ConcurrentReadInputStream.java:69)
at jgi.ReformatReads.process(ReformatReads.java:480)
at jgi.ReformatReads.main(ReformatReads.java:52)

I suspect that your read generating program is throwing the error for some reason. I suggest running that component separately and storing the file locally somewhere. Then you can check the gzip integrity with gzip -t {filename} to see if there are any issues with the compression format.

holtjma commented 3 years ago

Closing due to inactivity