Closed maxnoel closed 6 years ago
Hi @maxnoel
Adding stream support makes sense! Thanks for the PR!
Really clear to read but I'm a little concerned that a maliciously crafted BSON document could be problematic:
Forcing a call to make([]byte, tailSize)
with a negative tailSize
causing a makeslice
panic
First 4 bytes: 0xFF, 0xFF, 0xFF, 0xFF
Passing a malformed BSON document causing the BSON decoder to panic (a behaviour I hate, but can't change for compatibility reasons)
Forcing a 2GB memory allocation from a 4 byte file
First 4 bytes: 0xFF, 0xFF, 0xFF, 0x7F
They're solvable though! The first just needs a bounds check, and the second could be mitigated by having a recover()
panic handler that returns an exported ErrCorrupt
or something.
The third I think the only realistic way is to include a warning about the possibility in the documentation so at least it's not a surprise - something along the lines of "careful with untrusted data" - I can imagine this being a problem for small, low-RAM cloud instances more than anything.
Would you mind including test cases for the above? Thanks again!
Dom
Please consider this: bson is a package under mgo, so we might just apply mgo/mongodb restrictions to the bson package. Namely no document can be larger that 16MiB. This would solve the problem with malformed files where the first four bytes encode a very large integer. Opinions ?
Makes sense. I'll make the required modifications tomorrow, and restrict valid document sizes to [5B, 16MB]
Done! I reused the handleErr panic handler instead of writing my own, which means I can't return a specific ErrCorruptDocument error. Let me know if you'd rather I write my own.
Do you need anything else?
Hi @maxnoel
I'm getting through the backlog this afternoon - a quick glance looks great, should have a proper review in an hour or so 👍
Thanks very much!
Hi @maxnoel
Could you export the two const's referenced in the description and we can get this merged 👍
Great addition, it should make mgo easier to use for many people. And thanks for the slightly paranoid defensive programming ;)
Dom
Done!
Thanks @maxnoel - we really appreciate it!
No problems, glad I could contribute. Post-mortem analysis of Mongo dumps is something I do fairly frequently at work, and now that I'm doing it in Go instead of Python, copy/pasting my own decoder every time got old quickly ;)
Those types are analog to those found in json and yaml. They allow us to operate on io.Readers/io.Writers instead of raw byte slices. Streams are expected to be sequences of concatenated BSON documents: *.bson files from MongoDB dumps, for example.