Synzvato / decentraleyes

This repository has a new home: https://git.synz.io/Synzvato/decentraleyes
Mozilla Public License 2.0
1.45k stars 96 forks source link

Generalize the concept to any website #4

Open gadcam opened 8 years ago

gadcam commented 8 years ago

I find your addon useful but what about doing the same with any website? Check this file out https://github.com/AliasIO/Wappalyzer/blob/master/src/apps.json. It is the file used by Wappalyzer to uncovers the technologies used on a website, and sometimes with the versions. I think using these regular expressions it could be achievable.

matthieuy commented 8 years ago

I am not sure it is possible. Wappalyser parse the page after load and check a lot of framework/technologies/... Decentraleyes try to catch request before loading to replace it with local resources

If a website rename jquery-2.1.4.min.js to jq.js, how to detect the framework and version before load it ? And if the name is the same how to know if the file is same ?

Synzvato commented 8 years ago

That's an interesting suggestion. I think this is something that should be assessed with the help of a basic prototype that uses relevant, version specific, regular expressions by Wappalyzer.

However, I think we would need to ask ourselves:

Thanks @gadcam, for bringing this up. In terms of technical feasability, mentioned by @matthieuy: as long as the expressions can parse version numbers out of given links, it should at least be theoretically possible to get this done. To anyone interested in this approach, feel free to weigh in.

gadcam commented 8 years ago

@matthieuy You're right we won't be able to detect jq.js and there is no way to do that. What I offer is to match a regular expression instead of mapping an array and then keep everything else as it was. @Synzvato

what the performance impact of these regular expressions is versus CDN hostname checks;

In my opinion, we won't notice the impact on performance : in Wappalyzer we use a lot more than 1000 regular expressions and we use them with more content than the request's addresses.

if it's bad when a visited website or a small delivery network logs a single file request;

My idea is more to improve performance than privacy, with little CDN and websites that are not using a CDN.

if we are free to use the regular expression list we will be pulling out of Wappalyzer.

See https://github.com/AliasIO/Wappalyzer/blob/master/LICENSE.

Synzvato commented 8 years ago

@gadcam

In my opinion, we won't notice the impact on performance: in Wappalyzer we use a lot more than 1000 regular expressions and we use them with more content than the request's addresses.

I still believe that subjecting each and every request to a fairly large amount of complex regular expressions before it can be sent out is overkill. Especially since this will not serve the main purpose of the add-on: to protect people from large, centralized, Content Delivery Networks.

However, it's interesting and if someone is willing to look into this, I think we should give it a shot. As soon as there's a proper implementation, Pull Requests are welcome, let's first introduce this as an experimental feature (that's disabled by default) to see where it goes. What do you think?

See https://github.com/AliasIO/Wappalyzer/blob/master/LICENSE

From what I understand you cannot include parts of Wappalyzer's GPL(v3) licensed code in a larger project like Decentraleyes, that's licensed under MPL(2.0). I could be missing something?

gadcam commented 8 years ago

to a fairly large amount of complex regular expressions

We don't need the whole content of the apps.json file, to be more accurate I think we would need between 1 and 3 regular expressions per technology.

this will not serve the main purpose of the add-on

You are absolutely right, however I think you should do it to promote your tool, as it it will enhance privacy without a bad impact on performance on a lot of websites.

However, it's interesting and if someone is willing to look into this, I think we should give it a shot. As soon as there's a proper implementation, Pull Requests are welcome, let's first introduce this as an experimental feature (that's disabled by default) to see where it goes. What do you think?

I think you're right, I will come back to you if I manage to do a proper implementation of it.

From what I understand you cannot include parts of Wappalyzer's GPL(v3) licensed code in a larger project like Decentraleyes, that's licensed under MPL(2.0). I could be missing something?

The StackExchange posts seem quite clear.. I think we would either have to ask the owner if he could make an exception or change the license of Decentraleyes. In my opinion, we should first make it work and then find a workaround for the legal part.

Synzvato commented 8 years ago

We don't need the whole content of the apps.json file, to be more accurate I think we would need between 1 and 3 regular expressions per technology.

True, that's workable. I think that once custom resource bundles are introduced along with support for other types of resources (such as styles and fonts), the expressions might start piling up. But since it's an advanced feature, we should be fine if we let users specifically enable it for individual bundles.

You are absolutely right, however I think you should do it to promote your tool, as it it will enhance privacy without a bad impact on performance on a lot of websites.

I fully agree with you there. It's also great that this can be implemented without practically any downsides for people who have no need for it (as it uses hardly any disk space, and is truly idle when disabled).

I think you're right, I will come back to you if I manage to do a proper implementation of it.

Awesome! As a clear and concise name for the feature, what do you think of Border Patrol? I think it illustrates the concept quite well. It's slightly more resource intensive, but stops additional requests from leaving your machine. What do you think, would that work?

The StackExchange posts seem quite clear. I think we would either have to ask the owner if he could make an exception or change the license of Decentraleyes. In my opinion, we should first make it work and then find a workaround for the legal part.

Absolutely. Should all else fail, we could write comparable regular expressions.

stewie commented 8 years ago

If a website rename jquery-2.1.4.min.js to jq.js, how to detect the framework and version before load it ? And if the name is the same how to know if the file is same ?

Attempting to "detect framework and" seems beyond the scope of this extension. Yes, I would hope to match, by filename, any prospective request for "jquery-2.1.4.min.js" (regardless whether or not the URL reflects CDN hosting)

If a webpage author is so obtuse (or so devious) as to apply that filename to his custom script, my outlook is "so sad, too bad. Gonna intercept and use the permacached copy of jquery-2.1.4.min.js"