Closed WojciechMula closed 5 years ago
@WojciechMula I do not have any pickling bugs on the Python 2.7 build of 1.1.4 and this will thousands of users of scancode-toolkit on Linux, Windows and macOS.
Moreover, last year was tough for me (I was ill, then I bought and was renovating a flat, finally recent changes in ex-company had forced me to seek for a new job) and as a result I couldn't spend much time on side projects.
You owe none anything my friend! I hope you new job rocks!
That said, pickling is a not a great protocol. I would be quite happy with a custom binary format and protocol that eschews pickling entirely and have a similar purpose and effect. So do not be stuck on pickling
As an example https://github.com/RoaringBitmap/RoaringFormatSpec/ this is to store compressed bitmaps in C, Java and Go.
Another example of the eventually complexity of pickling here in pure Python for a trie structure: https://github.com/google/pygtrie/blob/master/pygtrie.py#L187 and https://github.com/google/pygtrie/blob/master/pygtrie.py#L261
So please by all mean let go of pickle if this can make your life simpler!
@pombredanne It's good to hear that you don't have any problems with pickling, but unfortunately there are some. I feel really uncomfortable that the module, which people like and use, doesn't work well and users waste precious time. Unless I resign, I am responsible for the module.
Speaking of pickle format, I think we must stick to python machinery as the module uses both C internal structures and python objects (i.e. values stored in the trie).
@WojciechMula your call with continuing to use pickle... but that not a feature IMHO. The feature is IMHO reasonably fast writing and reading of an automaton to and from disk (and thinking of it, using your own format would mean it could be memory mapped in the future .... yummy)
Bugs related to pickling are recurrent and annoys everybody; sometimes a bug causes crash of the interpreter which is completely unacceptable. I tried my best to track the problem(s) down, but I failed. Moreover, last year was tough for me (I was ill, then I bought and was renovating a flat, finally recent changes in ex-company had forced me to seek for a new job) and as a result I couldn't spend much time on side projects.
This project is pretty popular, and it would be great if somebody helped with a pickling algorithm. IMO the best option is to trash the current one and start over.