GraseHotspot / grase-www-portal

Grase WWW Portal (Main Package)
52 stars 20 forks source link

[POLL] HTTP tracking (and caching) #176

Open timwhite opened 6 years ago

timwhite commented 6 years ago

With the push towards a "secure" web (HTTPS everywhere), more and more sites are now only accessible over HTTPS. This means more and more sites are not showing in the Squid web logs.

It isn't possible to track what sites Hotspot users visit if they are visiting HTTPS websites. Reverse lookup of IP address works in some cases, and is wildly inaccurate in other cases (think of a website behind a cloudflare shared IP address). At no point is a Hotspot user going to modify their proxy settings so that we can monitor their HTTPS usage, that's just not good for user experience.

HTTPS accounts for more than 66% of all page loads in Chrome across all platforms with With more than 50% of all pages being loaded via Chrome being HTTPS, and most platforms have more than 75% of page loads of HTTPS. https://transparencyreport.google.com/https/overview?hl=en The encrypted web really is here, and here to stay. https://security.googleblog.com/2018/02/a-secure-web-is-here-to-stay.html

It's time to decide if it's worth the maintenance effort, and the CPU cycles, of running the Squid transparent proxy, and attempting to track users browsing history. The report is becoming more and more useless as more and more sites are HTTPS.

If voting to keep HTTP tracking, please leave a comment below as to why you want it kept

tomas213 commented 6 years ago

Two questions :

  1. what about stats on what pages users have visited for legal matters. Removing squid will lose all that data. Maybe we can use awstats.
  2. removoing squid and cache, will it have any affection on speed on browsing
timwhite commented 6 years ago

@tomas213

  1. Some countries require tracking, others require that we don't track. It may be best for countries that require tracking, we have a HowTo on setting up something like softflowd or similar, that will log IP connections, (IP, port, duration,bytes) and then it'll be up to the operator to cross match the hotspot IPs against that data in the instance they need to retrieve the logs for legal purposes.
  2. The cache will only be helping HTTP sites already. As the number of HTTP sites reduces, the cache performance will significantly reduce. If you have a site currently running, run a squid log analyser of the logs, and see how many cache hits you actually get. Any cache misses are when the file couldn't be served from the cache.
louis222 commented 6 years ago

I think it's a useful feature for countries that require it. However I think put a feature to easily turn it on and off if it's possible

tomas213 commented 6 years ago

TIm, if that's the case, then there can be a guide for tracking for those countries needed. I voted to remove squid!

joseborges commented 5 years ago

In my opinion, and as a user, i always rather have the option to do or not. So if it's possible, Tim consider adding this has a setting on the backoffice.

Control access (Squid)? [ ] Yes [ X ] No

And you could have it default to No.

(This would be the perfect solution for less tech savy users)