abhinavsingh / proxy.py

💫 Ngrok FRP Alternative • ⚡ Fast • 🪶 Lightweight • 0️⃣ Dependency • 🔌 Pluggable • 😈 TLS interception • 🔒 DNS-over-HTTPS • 🔥 Poor Man's VPN • ⏪ Reverse & ⏩ Forward • 👮🏿 "Proxy Server" framework • 🌐 "Web Server" framework • ➵ ➶ ➷ ➠ "PubSub" framework • 👷 "Work" acceptor & executor framework
https://abhinavsingh.com/proxy-py-a-lightweight-single-file-http-proxy-server-in-python/
BSD 3-Clause "New" or "Revised" License
3.07k stars 578 forks source link

Replay from response cache #1244

Open rthill91 opened 2 years ago

rthill91 commented 2 years ago

This is essentially a followup on #319

A solution was merged into cache-server but, so far as I can tell, hasn't ever made its way into develop or a release. Is this a dead/forgotten feature or are there still plans to see that merged?

Thanks

abhinavsingh commented 2 years ago

There are 2 separate things:

1) Cache server for production usage 2) Cache replay only during tests

Afaik, 2) can be achieved and already exists in some form. Sorry, I haven't tested or used this myself in a long time. See https://github.com/abhinavsingh/proxy.py/blob/85ad44b46eb81db64dc06fbf289151180bd68aad/proxy/testing/test_case.py#L78 and https://github.com/abhinavsingh/proxy.py/blob/85ad44b46eb81db64dc06fbf289151180bd68aad/tests/testing/test_embed.py#L50

You might also be interested in the cache responses plugin, which has the capability to cache everything it sees passing through it (e.g. txt, mp4, json, xml, jpg etc)

Finally, using cache server in production is another story altogether. cache-server branch contains code for production cache server. Imagine Squid. Issue with a full fledged cache server is cache header management. If not done correctly, browsers/clients will behave unexpectedly. This has also been confirmed by other users on cache threads.

rthill91 commented 2 years ago

My hope was to use proxy.py to cache all responses during an end-to-end testing session, and then to be able to re-use those responses for subsequent test runs.

Essentially what vcrpy does, except I have requests coming from multiple processes. As near as I can tell, the cache responses plugin records the responses but has no way of replaying them later. The cache-server branch does, but hasn't been touched in some time.

It's also entirely possible I've just misunderstood how something works, but my basic setup was

proxy --plugins proxy.plugin.CacheResponsesPlugin
curl -x "http://localhost:8899" "https://www.google.com"
# disconnect network
curl -x "http://localhost:8899" "https://www.google.com"
# curl hangs, I would expect proxy.py to replay cached response here
abhinavsingh commented 2 years ago

@rthill91 You are correct. Looking at the code here, I realize VCR facility only enables cache responses plugin, which by itself is not responsible for replaying the responses.

Ref https://github.com/abhinavsingh/proxy.py/blob/85ad44b46eb81db64dc06fbf289151180bd68aad/proxy/testing/test_case.py#L77-L83

Can you point the code in cache-server branch which is responsible for replaying the cache? Sorry, but I have myself not visited that branch in a while. I see, it should be relatively easy to only pull out this use-case out of cache-server branch.

Irrespective of how we achieve it, here is what will needs to be done: 1) CacheResponses plugin must surface path to the cache per request 2) This can be easily done using the logging context hook that proxy plugins can optionally implement. Example, see how ProgramName plugin exposes it's result here https://github.com/abhinavsingh/proxy.py/blob/85ad44b46eb81db64dc06fbf289151180bd68aad/proxy/plugin/program_name.py#L64-L66 3) Once, CacheResponse plugin has surfaced the artifacts, out TestCase can then simply read/replay responses out of these cached responses.

This will still require some code to pull off end-to-end. Let me know if these pointers help. Try to see locally, if using this strategy helps. Let me know and I'll be happy to include necessary bits in the library itself.