bson: Added Encoder and Decoder types for stream encoding/decoding.

maxnoel commented 6 years ago

Those types are analog to those found in json and yaml. They allow us to operate on io.Readers/io.Writers instead of raw byte slices. Streams are expected to be sequences of concatenated BSON documents: *.bson files from MongoDB dumps, for example.

domodwyer commented 6 years ago

Hi @maxnoel

Adding stream support makes sense! Thanks for the PR!

Really clear to read but I'm a little concerned that a maliciously crafted BSON document could be problematic:

Forcing a call to make([]byte, tailSize) with a negative tailSize causing a makeslice panic First 4 bytes: 0xFF, 0xFF, 0xFF, 0xFF
Passing a malformed BSON document causing the BSON decoder to panic (a behaviour I hate, but can't change for compatibility reasons)
Forcing a 2GB memory allocation from a 4 byte file First 4 bytes: 0xFF, 0xFF, 0xFF, 0x7F

They're solvable though! The first just needs a bounds check, and the second could be mitigated by having a recover() panic handler that returns an exported ErrCorrupt or something.

The third I think the only realistic way is to include a warning about the possibility in the documentation so at least it's not a surprise - something along the lines of "careful with untrusted data" - I can imagine this being a problem for small, low-RAM cloud instances more than anything.

Would you mind including test cases for the above? Thanks again!

Dom

szank commented 6 years ago

Please consider this: bson is a package under mgo, so we might just apply mgo/mongodb restrictions to the bson package. Namely no document can be larger that 16MiB. This would solve the problem with malformed files where the first four bytes encode a very large integer. Opinions ?

maxnoel commented 6 years ago

Makes sense. I'll make the required modifications tomorrow, and restrict valid document sizes to [5B, 16MB]

maxnoel commented 6 years ago

Done! I reused the handleErr panic handler instead of writing my own, which means I can't return a specific ErrCorruptDocument error. Let me know if you'd rather I write my own.

maxnoel commented 6 years ago

Do you need anything else?

domodwyer commented 6 years ago

Hi @maxnoel

I'm getting through the backlog this afternoon - a quick glance looks great, should have a proper review in an hour or so 👍

Thanks very much!

domodwyer commented 6 years ago

Hi @maxnoel

Could you export the two const's referenced in the description and we can get this merged 👍

Great addition, it should make mgo easier to use for many people. And thanks for the slightly paranoid defensive programming ;)

Dom

maxnoel commented 6 years ago

Done!

domodwyer commented 6 years ago

Thanks @maxnoel - we really appreciate it!

maxnoel commented 6 years ago

No problems, glad I could contribute. Post-mortem analysis of Mongo dumps is something I do fairly frequently at work, and now that I'm doing it in Go instead of Python, copy/pasting my own decoder every time got old quickly ;)

globalsign / mgo

bson: Added Encoder and Decoder types for stream encoding/decoding. #127