browsh-org / browsh

A fully-modern text-based browser, rendering to TTY and browsers
https://www.brow.sh
GNU Lesser General Public License v2.1
16.92k stars 411 forks source link

Implement a client side caching mechanism #48

Open tobimensch opened 6 years ago

tobimensch commented 6 years ago

Generate a md5sum for every full page of text that is intended to being sent to the client. Send md5sum and the full page text/data, with the md5sum in the front. When the client detects it already displayed the text with the same md5sum before, because it is in its cache, it can cancel loading the rest of the text/data and therefore lower the total bandwith required. Now the client can simply display the text/data from its cache. Possibly there are more elegant ways to implement a mechanism like that, this is just food for thought.

I'm an addict to multiple websites and in reality they change a lot less often than I'm frequenting them.

And then there are webpages that don't change at all over long periods of time like documentation for APIs and so forth.

The cache on the client side (be it the CLI or a future GUI) should be compressed to save space, of course.

tobimensch commented 6 years ago

An advanced version of this could send a message to the client that the whole page simply hasn't changed (assuming the client loaded the whole page from top to bottom before) and therefore the client could operate completely from the cache and scroll up down without any latency until a link is clicked or a the server/webext sends another message that something on the page has changed.

Caching wouldn't only be good for the user experience with drastically reduced load times, latency and bandwith usage, it would also reduce the bandwith required from the server/webext and hopefully also cut down on rendering times, if it's possible to detect that a webpage is unchanged in the webext without needing to render it (for example through checking for changes in the html sources..).

tombh commented 6 years ago

Are you talking about the HTTP Server or the TTY client? Because caching will very easily be achieved for the HTTP Server using Google's CDN.

tobimensch commented 6 years ago

I'm talking about all current (TTY) and future clients. I'm assuming the http-server will only be one of many ways to browsh.

In my opinion special browsh optimized browsers (that aren't traditional web/HTML browsers and are comparable to the current TTY client) will produce the best surfing result possible.

tobimensch commented 6 years ago

I'm talking specifically about client side caching in this issue, however server side caching is equally as important, of course.

tombh commented 6 years ago

The trouble is that caching will be impossible for the TTY client. For example, what if you cache the pages of a site that people log into - you could end up loading your email page only to see the cached version of someone else's emails!

tobimensch commented 6 years ago

The obvious answer would be to have separate clients for separate users with separate configs/userdirs etc.

And I'm running it on my own server for myself, so there's no conflict with other user's data privacy in sight.

Maybe the tty client should even be completely separated from the webext/server. I know mosh has benefits with compression and allowing for a stable connection and so forth, but besides this there's no logical reason why the tty client shouldn't be able to run locally on a user's machine eventually, like say the irssi IRC client which I hold dear.

tombh commented 6 years ago

I've definitely thought about building a custom client for Browsh, taking the best bits of Mosh but adding Browsh-specific features. The trouble is that it's an extraordinary amount of work :/

But also, about this caching idea, I think there would still be caveats to caching based on URL. There are so many web pages with dynamic content, that I think it'd be largely frustrating. Like getting a password reset reminder, but being able to see it in your email until the cache had expired. I mean I guess there could be an option to turn cache on and off.

tobimensch commented 6 years ago

That's why I suggested that something like an md5sum should be sent to the client. The md5sum is generated from the content that is going to be displayed. If the client already has the same md5sum in its cache database, then it knows that it doesn't need to request/load the whole content and can pull and display it from the cache.

tombh commented 6 years ago

Oh! I didn't get that from the first reading, but yes that's a very doable idea.

andrew commented 6 years ago

Sounds a lot like e-tag caching