betamaxpy / betamax

A VCR imitation designed only for python-requests.
https://betamax.readthedocs.io/en/latest/
Other
565 stars 62 forks source link

Content-Length header leaks original length of placeholder values #137

Open smallnamespace opened 7 years ago

smallnamespace commented 7 years ago

Since we record all the headers, even when we replace sensitive tokens with their placeholders, the original total length of all tokens is preserved.

A few possible options:

  1. Fake the content-length by incrementing by placeholder length - original length when using placeholders bidirectionally
  2. Simply drop Content-Length when placeholders are used?
  3. Keep current behavior but note the risk in the docs
--- Want to back this issue? **[Post a bounty on it!](https://www.bountysource.com/issues/44967906-content-length-header-leaks-original-length-of-placeholder-values?utm_campaign=plugin&utm_content=tracker%2F198445&utm_medium=issues&utm_source=github)** We accept bounties via [Bountysource](https://www.bountysource.com/?utm_campaign=plugin&utm_content=tracker%2F198445&utm_medium=issues&utm_source=github).
sigmavirus24 commented 7 years ago

Thanks for reporting this!

Sorry for the delay. The last few months have been hectic for me. This gets tricky but can be done. Would you be willing to work on a PR for this?

hroncok commented 6 years ago

Rather than trying to be too clever, I think 3. is the correct approach here. We could mention that in the docs and point the reader to a way to implement a protection against this (such as 2.) in a hook.

@smallnamespace Still interested?

smallnamespace commented 6 years ago

@hroncok Sorry for dropping this earlier -- I can take a look, but will take me awhile to get around to.

hroncok commented 6 years ago

Thanks! This has been sitting here for a couple of months, so don't worry if it's not fixed this week :)

sigmavirus24 commented 6 years ago

Yeah, I'm honestly a bit torn on this, and I should have explained sooner.

The main reason why it's tricky (which is likely solvable):

Betamax does its best to preserve the Content-Encoding if it is present. If we receive a compressed response, we try to preserve that, the headers, and that includes Content-Length. Perhaps we should always decrypt it and alter those headers. If we change that behaviour, though, we probably want a way to re-enable it for more advanced users who may be checking those attributes.

After we figure that out, we need to figure out what to store in Content-Length. Do we always re-calculate it on replay? Do we never store the real content-length in the cassette? Do we only use a placeholder for it when we've replaced something else?