WojciechMula / pyahocorasick

Python module (C extension and plain python) implementing Aho-Corasick algorithm
BSD 3-Clause "New" or "Revised" License
937 stars 122 forks source link

Invalid pickle file generated: "ValueError: binary data truncated (1)" #50

Closed EmilStenstrom closed 5 years ago

EmilStenstrom commented 7 years ago

I've managed to create an automation, and then pickle that automation to a 286 Mb pickle file. Problem is, when I try to unpickle, I get this error:

$ python -m pickle wikidata-automation.pickle 
Traceback (most recent call last):
  File "/usr/local/Cellar/python3/3.5.2/Frameworks/Python.framework/Versions/3.5/lib/python3.5/runpy.py", line 184, in _run_module_as_main
    "__main__", mod_spec)
  File "/usr/local/Cellar/python3/3.5.2/Frameworks/Python.framework/Versions/3.5/lib/python3.5/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/usr/local/Cellar/python3/3.5.2/Frameworks/Python.framework/Versions/3.5/lib/python3.5/pickle.py", line 1605, in <module>
    obj = load(f)
ValueError: binary data truncated (1)

The source of that error is here: https://github.com/WojciechMula/pyahocorasick/blob/master/Automaton_pickle.c#L309

Would you mind helping me troubleshoot this? Any ideas? I don't think I can send files this big to you?

Update: This is how I build the pickle file:

automaton = ahocorasick.Automaton()
for i, (label, id_) in enumerate(generator):
    automaton.add_word(label, id_)

automaton.make_automaton()

with open(filename_out, "wb") as f:
    pickle.dump(automaton, f)

Where generator just runs yield ("Belgium", "Q31").

EmilStenstrom commented 7 years ago

I understand that this is something that is tricky to reproduce. Therefore I've created a new repository with my code, and invited you to that repository. I've added documentation on how to run the code there.

Beware that running through the full wikidata dump with 24 million entries takes several hours. After all the building is done you can quickly run the example and see it fail.

Somehow the UNLIKELY(size < count*(sizeof(TrieNode) - sizeof(TrieNode*))) returns true, preventing the pickle file to be read.

Let me know if there is anything I can do to help troubleshoot this.

WojciechMula commented 7 years ago

@EmilStenstrom Thanks a lot for your effort! I'll try to reproduce the bug.

WojciechMula commented 7 years ago

@EmilStenstrom I was able to build wikidata-reduced.json, it was really time consuming. :) Now can I debug, thank you.

EmilStenstrom commented 7 years ago

@WojciechMula Phew. Now you have some more waiting to do as you build the automation, and then try to search it. Building the automation works. The crash occurs when you try to load it using the last command.

Sorry about the long waits :)

WojciechMula commented 7 years ago

@EmilStenstrom I'm working on this issue now, and managed to fix ugly memory leak. It's not a fix yet. :)

EmilStenstrom commented 7 years ago

@WojciechMula That sounds fantastic! :) I'm happy all that processing power didn't go to waste.

WojciechMula commented 7 years ago

@EmilStenstrom I'm still trying to reproduce the bug. Unfortunately, my laptop has too few memory and your app is killed after eating all 4GB. I tried to split input and then build/pickle/unpickle smaller chunks, but nothing wrong happened so far. I supposed there were some unicode-related problems (like #53), but it seems it's not a case. Just writing to give you feedback.

EmilStenstrom commented 7 years ago

@WojciechMula I'm thinking of different ways of helping out. Would it help if I sent you the pickle file? It is 88 Mb if I zip it, so I think I can give you a dropbox link? What e-mail should I send the link to?

EmilStenstrom commented 7 years ago

I had an idea that maybe the whole file was truncated? So I found that you can inspect a pickle file with pickletools from the python library. But it seems it ends in the expected way:

$ python -m pickletools wikidata_automation.pickle | tail
268149006: r            LONG_BINPUT 13858041
268149011: X            BINUNICODE 'Q27876039'
268149025: r            LONG_BINPUT 13858042
268149030: e            APPENDS    (MARK at 268141046)
268149031: t        TUPLE      (MARK at 27)
268149032: r    LONG_BINPUT 13858043
268149037: R    REDUCE
268149038: r    LONG_BINPUT 13858044
268149043: .    STOP
highest protocol among opcodes = 3
WojciechMula commented 7 years ago

@EmilStenstrom If you can, please send me the pickle file directly. My e-mail: wojciech_mula@poczta.onet.pl

EmilStenstrom commented 7 years ago

Sent the link to your e-mail!

I also pushed some updates the the script that creates the wikidata-reduced.json file (I sent you the old file, not the updated one to make sure you can reproduce). The updated file is now excluding lots of entities I'm not interested in anyway. Should be about half the size. Maybe that makes it possible to create the automation on 4 Gb? I'm on a Macbook Pro from work with 16 Gb RAM, so I can deal with huge files.

WojciechMula commented 7 years ago

@EmilStenstrom Just clicked what's wrong. If your automaton takes several gigabytes it's almost impossible that a pickle file would be several times smaller.

EmilStenstrom commented 7 years ago

@WojciechMula: So something is wrong with how I create the pickle file?

WojciechMula commented 7 years ago

@EmilStenstrom You're doing everything perfectly right, there's some bug in pickle. Just created automaton with 1.000.000 words and pickled file has 350MB.

WojciechMula commented 7 years ago

@EmilStenstrom You've shown tail of the pickled file, but could you please the beginning on file. On my system I have:

$ python3 -m pickletools ref.pickle
    0: \x80 PROTO      3
    2: c    GLOBAL     'ahocorasick Automaton'
   25: q    BINPUT     0
   27: (    MARK
   28: J        BININT     168760016
   33: C        SHORT_BINBYTES b''
   35: q        BINPUT     1
   37: K        BININT1    2
   39: K        BININT1    2
   41: J        BININT     16182875
   46: J        BININT     16182874
   51: M        BININT2    310
   54: ]        EMPTY_LIST
   55: q        BINPUT     2
   57: (        MARK

For sure the file is corrupted. At offset 33 is an empty bytes object, while it should be a large blob of data, field at offset 37 should be 20. For now I have no idea what's wrong, of course will continue work on this.

Is everything OK when you build smaller automatons? Do you have own compilation of Python, or it comes from precompiled package?

EmilStenstrom commented 7 years ago

Here's the first 20 lines of my file:

$ python -m pickletools wikidata_automation.pickle | head -n20
    0: \x80 PROTO      3
    2: c    GLOBAL     'ahocorasick Automaton'
   25: q    BINPUT     0
   27: (    MARK
   28: J        BININT     168760016
   33: C        SHORT_BINBYTES b''
   35: q        BINPUT     1
   37: K        BININT1    2
   39: K        BININT1    2
   41: J        BININT     16182875
   46: J        BININT     16182874
   51: M        BININT2    310
   54: ]        EMPTY_LIST
   55: q        BINPUT     2
   57: (        MARK
   58: X            BINUNICODE 'Q23600353'
   72: q            BINPUT     3
   74: X            BINUNICODE 'Q14877373'
   88: q            BINPUT     4
   90: X            BINUNICODE 'Q26446664'

Looks very similar to yours.

I'm using the latest stable version of python (3.5.2) that is distributed with Homebrew (the most popular package manager for Mac).

$ brew info python3
python3: stable 3.5.2 (bottled), devel 3.6.0rc1, HEAD
Interpreted, interactive, object-oriented programming language
https://www.python.org/
/usr/local/Cellar/python3/3.5.2 (3,664 files, 55.0M) *
  Poured from bottle on 2016-07-07 at 20:30:19
From: https://github.com/Homebrew/homebrew-core/blob/master/Formula/python3.rb
$ python3.5 
Python 3.5.2 (default, Jun 29 2016, 13:43:58) 
[GCC 4.2.1 Compatible Apple LLVM 7.3.0 (clang-703.0.31)] on darwin
EmilStenstrom commented 7 years ago

I've now tried with a couple of different files. First a new one generated with the updated script. It removes all empty labels:

$ python run_wikidata_search.py wikidata_automation_noempty.pickle "Belgium, Sweden and Poland are three fine countries"
Traceback (most recent call last):
  File "run_wikidata_search.py", line 17, in <module>
    main(filename_in, text)
  File "run_wikidata_search.py", line 11, in main
    automation = pickle.load(f)
ValueError: binary data truncated (3)

Same error, but with a (3) at the end instead of a (1) as before. When I try to run pickletools on this file i get:

$ python -m pickletools wikidata_automation_noempty.pickle
    0: \x80 PROTO      3
    2: c    GLOBAL     'ahocorasick Automaton'
   25: q    BINPUT     0
   27: (    MARK
   28: J        BININT     144852559
Traceback (most recent call last):
  File "/usr/local/Cellar/python3/3.5.2/Frameworks/Python.framework/Versions/3.5/lib/python3.5/runpy.py", line 184, in _run_module_as_main
    "__main__", mod_spec)
  File "/usr/local/Cellar/python3/3.5.2/Frameworks/Python.framework/Versions/3.5/lib/python3.5/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/usr/local/Cellar/python3/3.5.2/Frameworks/Python.framework/Versions/3.5/lib/python3.5/pickletools.py", line 2833, in <module>
    args.indentlevel, annotate)
  File "/usr/local/Cellar/python3/3.5.2/Frameworks/Python.framework/Versions/3.5/lib/python3.5/pickletools.py", line 2475, in dis
    print(line, file=out)
OSError: [Errno 22] Invalid argument

And it hangs a LONG time before outputing the OSError which I think confirms that this file contains the large blob of data that should be there.

I've also tried with a much smaller wikidata-reduced-file (only 10 lines) and everything works fine there. Inspecting that file with pickletools yields the correct results:

$ python -m pickletools wikidata_automation_mini.pickle
    0: \x80 PROTO      3
    2: c    GLOBAL     'ahocorasick Automaton'
   25: q    BINPUT     0
   27: (    MARK
   28: M        BININT2    302
   31: B        BINBYTES   b'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x15\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x02\x00\x00\x00\x00\x00\x00\x00\x0b\x00\x00\x00\x00\x00\x00\x00"\x00\x00\x00\x00\x00\x00\x00$\x00\x00\x00\x00\x00\x00\x005\x00\x00\x00\x00\x00\x00\x00?\x00\x00\x00\x00\x00\x00\x00E\x00\x00\x00\x00\x00\x00\x00P\x00\x00\x00\x00\x00\x00\x00\\\x00\x00\x00\x00\x00\x00\x00a\x00\x00\x00\x00\x00\x00\x00j\x00\x00\x00\x00...
WojciechMula commented 7 years ago

@EmilStenstrom Thank you very much for checking this. I have some vague ideas about the source of errors, but need to verify it. I haven't replicated your problems yet.

WojciechMula commented 7 years ago

@EmilStenstrom Sorry for a stupid question, but: is your MacOS 64-bit?

EmilStenstrom commented 7 years ago

@WojciechMula Yes. The processor is an "Intel Core i7" which is 64 bit, and the macOS version is Sierra which runs in 64 bit mode. Also, my python returns 64 bit:

$ python -c "import platform; print(platform.architecture())"
('64bit', '')
WojciechMula commented 7 years ago

@EmilStenstrom Thank you, I supposed it might be somehow related to integer overflows. ATM I have no idea how to reproduce the error and what might be its cause.

Could you recompile the module with -fsanitize=address and -fsanitize=undefined. I think setting CFLAGS is sufficient, i.e.:

export CFLAGS="-fsanitize=address -fsanitize=undefined"
WojciechMula commented 7 years ago

@EmilStenstrom I didn't forget about the problem, just run out of ideas.

EmilStenstrom commented 7 years ago

Hi! I'm still planning to try the compile flags you suggested above, didn't have time! Maybe next week!

EmilStenstrom commented 7 years ago

Here's the output after running with the CFLAGS you suggested:

$ python run_wikidata_build_automation.py wikidata-reduced.json wikidata_automation.pickle
==57697==ERROR: Interceptors are not working. This may be because AddressSanitizer is loaded too late (e.g. via dlopen). Please launch the executable with:
DYLD_INSERT_LIBRARIES=/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/lib/clang/8.0.0/lib/darwin/libclang_rt.asan_osx_dynamic.dylib
==57697==AddressSanitizer CHECK failed: /Library/Caches/com.apple.xbs/Sources/clang_compiler_rt/clang-800.0.42.1/src/projects/compiler-rt/lib/sanitizer_common/sanitizer_mac.cc:690 "(("interceptors not installed" && 0)) != (0)" (0x0, 0x0)
    <empty stack>

Abort trap: 6
$ export DYLD_INSERT_LIBRARIES=/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/lib/clang/8.0.0/lib/darwin/libclang_rt.asan_osx_dynamic.dylib
$ python run_wikidata_build_automation.py wikidata-reduced.json wikidata_automation.pickle
Building automaton...
Building automaton, step 0...
Building automaton, step 100000...
Building automaton, step 200000...
...
Building automaton, step 16800000...
Time to make it...
Killed: 9

It takes all my RAM (16 Gb) for about an hour, and then gets killed.

I guess we won't get any further from here. I think I should try to solve my problem in another way. Instead of trying to build the Trie in memory, I should persist it to disk in some sort of database optimized for this usecase. Thank you for all you hard work!

WojciechMula commented 7 years ago

@EmilStenstrom Thank you very much for your time and effort. I really want to fix that bug, but so far I couldn't. :(

As far I understand your problem, you could try ngram-indexes. They allow to narrow searched space significantly, and are not too complicated. I did some experiments with full-text search and results were impressive.

woakesd commented 7 years ago

Hi

I'm hitting something like this with Python-3.5 in windows. It seems related to Python-3.5 as I can read the same pickle in 3.4 without error and use the automaton.

I'll email you details of the files and load them in dropbox for you to download. The dataset it much smaller than the one mentioned here (the pickle is only 17Mb.

It doesn't seem to matter if the pickle is created in 3.4 or 3.5. The read issue only happens in Windows though.

Reading it in linux returns an automaton with no words! Guess that is too much to hope for!

WojciechMula commented 7 years ago

David, thank you very much, will look closer at this. I've already download the file.

woakesd commented 7 years ago

Tested with windows in Python 3.6 and no error, so looks like Windows Python 3.5 only.

WojciechMula commented 7 years ago

That's a great news. Thank you for checking this.

WojciechMula commented 7 years ago

And thank you for the regression test.

WojciechMula commented 7 years ago

@woakesd I installed all official versions: 3.5.0, 3.5.1, 3.5.2 and 3.5.3 -- and I was able to load the pickle file you shared with me. I tested also 3.4.4 and 3.6.0. The regression test also passes. Strange.

Which specific version of Python do you use?

woakesd commented 7 years ago

I removed 3.5.3 and reinstalled everything including version 1.1.5.dev1 build locally against 3.5.2

The regression test failed still.

I uninstalled pyahocorasick and rebuilt against the installed version of python 3.5.3 and ran the test again and it works.

I still can't load the automaton.pickle file!

I've split the test into two files. Could you see if this still works for you? It doesn't here.

WojciechMula commented 7 years ago

It works for me. I use MSVC 2015 to compile the extension and I'm on repo's head (1.1.5.dev1 is uncompilable in Windows).

WojciechMula commented 7 years ago

@woakesd I just committed some debug code c79bd66246b07c6d120cce5f7817af8eb1f3817c, could you please check it out? On my machine I get following output:

unpickle: 7 nodes

unpickle: node #1 at offset 0
unpickle: node #1.fail   = 0
unpickle: node #1.letter = 0
unpickle: node #1.eow    = 0
unpickle: node #1.n      = 2
unpickle: node #1.next[0] = 2
unpickle: node #1.next[1] = 5

unpickle: node #2 at offset 40
unpickle: node #2.fail   = 0
unpickle: node #2.letter = 97
unpickle: node #2.eow    = 0
unpickle: node #2.n      = 1
unpickle: node #2.next[0] = 3

unpickle: node #3 at offset 72
unpickle: node #3.fail   = 0
unpickle: node #3.letter = 98
unpickle: node #3.eow    = 0
unpickle: node #3.n      = 1
unpickle: node #3.next[0] = 4

unpickle: node #4 at offset 104
unpickle: node #4.fail   = 0
unpickle: node #4.letter = 99
unpickle: node #4.eow    = 1
unpickle: node #4.n      = 0

unpickle: node #5 at offset 128
unpickle: node #5.fail   = 0
unpickle: node #5.letter = 100
unpickle: node #5.eow    = 0
unpickle: node #5.n      = 1
unpickle: node #5.next[0] = 6

unpickle: node #6 at offset 160
unpickle: node #6.fail   = 0
unpickle: node #6.letter = 101
unpickle: node #6.eow    = 0
unpickle: node #6.n      = 1
unpickle: node #6.next[0] = 7

unpickle: node #7 at offset 192
unpickle: node #7.fail   = 0
unpickle: node #7.letter = 102
unpickle: node #7.eow    = 1
unpickle: node #7.n      = 0
woakesd commented 7 years ago

With the repo head in 3.5.3 I get no output apart from the error. I added one extra trace just to be sure, but it doesn't look like it gets as far as automaton_unpickle.

WojciechMula commented 7 years ago

What is the output from this script on your machine?

from ahocorasick import Automaton
auto = Automaton()
auto.add_word('abc', 'abc')
auto.add_word('def', 'def')

x = auto.__reduce__()
print(x)

I got this:

(<class 'ahocorasick.Automaton'>, (7, b'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00
\x00\x00\x00\x00\x00\x00\x02\x00\x00\x00\x00\x00\x00\x00\x02\x00\x00\x00\x00\x00
\x00\x00\x05\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00
\x00\x00\x00\x00\x00\x00\x01\x00\x00\x00\x00\x00a\x00\x03\x00\x00\x00\x00\x00\x0
0\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x01\x00\x0
0\x00\x00\x00b\x00\x04\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x
00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x01\x00c\x00\x00\x00\x00\x00\
x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x01\x00\x00\x00\x00\x00d\x00\x06
\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00
\x00\x00\x00\x01\x00\x00\x00\x00\x00e\x00\x07\x00\x00\x00\x00\x00\x00\x00\x00\x0
0\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x01\x0
0f\x00', 1, 30, 100, 2, 2, 3, ['abc', 'def']))
woakesd commented 7 years ago

I get the following:

(<class 'ahocorasick.Automaton'>, (7, b'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00
\x00\x00\x00\x00\x00\x00\x02\x00\x00\x00\x82\x01\x00\x00\x02\x00\x00\x00\x00\x00
\x00\x00\x05\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00
\x00\x00\x00\x00\x00\x00\x01\x00\x00\x00\x82\x01a\x00\x03\x00\x00\x00\x00\x00\x0
0\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x01\x00\x0
0\x00\x82\x01b\x00\x04\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x
00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x83\x01c\x00\x00\x00\x00\x00\
x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x01\x00\x00\x00\x82\x01d\x00\x06
\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00
\x00\x00\x00\x01\x00\x00\x00\x82\x01e\x00\x07\x00\x00\x00\x00\x00\x00\x00\x00\x0
0\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x83\x0
1f\x00', 1, 30, 100, 2, 2, 3, ['abc', 'def']))

I deleted the comment where I thought it was working because I tested with 3.6 not 3.5.3. Sigh

woakesd commented 7 years ago

I added another trace which produces the following output:

size 216, count 7, sizeof(TrieNode) 32, sizeof(TrieNode*) 8
woakesd commented 7 years ago

Could you send me a wheel for 64 bit 3.5.3.

I'm exploring the idea that there is a build config issue with my laptop.

I'm installing Visual Studio 2015 on another laptop just now to try it out on another machine

woakesd commented 7 years ago

Found how to break like this and why I think it works for you!

Using 32 bit python 3.5.3 in Windows I can create and load the pickle no problem.

The 64 bit version is where the issue lies.

The pickle for 64 bit windows is larger, 294 bytes instead of 214 bytes.

WojciechMula commented 7 years ago

@woakesd I'm not on Windows right now, but I'm pretty sure that I have 64-bit versions of python and compilation also produces 64-bit binaries. But it might be a proper hint, thank you for checking it.

I will send you my compiled modules tomorrow.

pombredanne commented 6 years ago

@EmilStenstrom do you mind trying with the latest release?

EmilStenstrom commented 6 years ago

@pombredanne Sorry, I don't have any of the code left I used for this. Since my usecase was too big for RAM I just decided to go another route...

Dobatymo commented 6 years ago

I have the same problem. I create a large automaton (several gigabytes in memory), pickle and load from disk: ValueError: binary data truncated (1) Python 3.6.6 x64 Windows 10, installed with pip install pyahocorasick

I cannot test if 32-bit python works, because the dataset is too large and I get a memory error (SystemError: <built-in method add_word of ahocorasick.Automaton object at 0x080305E0> returned NULL without setting an error)

WojciechMula commented 6 years ago

@Dobatymo is it possible to somehow get the dataset you use? I'd love to finally fix the bug, but I'm not able to reproduce it on my own.

Dobatymo commented 6 years ago

I use this one https://dumps.wikimedia.org/enwiki/latest/enwiki-latest-all-titles-in-ns0.gz Tomorrow I can check which version/date of the dump exactly and give you the code to reproduce it.

WojciechMula commented 6 years ago

Great! Thank you

EmilStenstrom commented 6 years ago

Hah, nice! That’s the original dataset I used too. But I used the Swedish version of Wikipedia, not the English one. The idea was to quickly find all Wikipedia articles from a span of text.

Some thoughts:

1) does the automation get bigger than ram? 2) is there very long strings in Wikipedia that somehow throws this off? 3) are there Unicode codepoints that mess things up?

Dobatymo commented 6 years ago

Python 3.6.6 (v3.6.6:4cf1f54eb7, Jun 27 2018, 03:37:03) [MSC v.1900 64 bit (AMD64)] on win32

import gzip, pickle
import ahocorasick

def read(wiki_titles):
    with gzip.open(wiki_titles, "rt", encoding="utf-8") as fr:
        for line in fr:
            yield line.strip()

def create_automaton(wiki_titles):
    a = ahocorasick.Automaton()

    for i, line in enumerate(read(wiki_titles)):
        a.add_word(line.lower(), i)
    a.make_automaton()

    return a

if __name__ == "__main__":

    # https://dumps.wikimedia.org/enwiki/20180701/enwiki-20180701-all-titles-in-ns0.gz
    wiki_path = "enwiki-20180701-all-titles-in-ns0.gz"
    pickle_path = "enwiki.p"

    with open(pickle_path, "wb") as fw:
        a = create_automaton(wiki_path)
        pickle.dump(a, fw)
        del a

    with open(pickle_path, "rb") as fr:
        a = pickle.load(fr)
Traceback (most recent call last):
  File "...\test.py", line 30, in <module>
    a = pickle.load(fr)
ValueError: binary data truncated (1)

Memory usage maxes out at 10.75 GB (just small enough to work on my 16GB machine)

stats
EmilStenstrom commented 5 years ago

@WojciechMula Maybe it's time to close this bug, until someone sees this problem with the latest version of the code, and Python 3.6+?