claustromaniac / detect-cloudflare-plus

True Sight Firefox extension.
https://addons.mozilla.org/firefox/addon/detect-cloudflare-plus/
GNU General Public License v3.0
37 stars 3 forks source link

requests for particular CDNs #5

Open claustromaniac opened 5 years ago

claustromaniac commented 5 years ago

Use this issue for requests regarding individual CDNs.


Platform to investigate Added in Notes
75CDN
Advanced Hosting
Akamai 98ec768
Alibaba Cloud 47c01bd
Amazon Cloudfront 98ec768
Amazon Shield 9e347b6
Azion :x: Needs custom pragma in the request to get debug headers in the response
Azure :x: Can't be detected reliably via headers.
Baidu 6cc3ae8
BelugaCDN 6cc3ae8
BootCDN
BootstrapCDN 6cc3ae8
BunnyCDN 6cc3ae8
CacheFly CDN Offers custom CDNs and multi-CDN setups, seizing other popular CDNs (like Cloudflare)
CDN.net
CDN77 e3112f7
CDNetworks a149a10 Uses Zenedge internally
cdnlion :x:
ChinaCache 9e347b6
Cloudflare 1fa92cc
Cloudflare AMP Already detected by CF filters
Cloudflare IPFS gateway Same as above
Edgecast e3112f7
Fastly 33f24e4
fly.io 9e347b6
Flywheel 9e347b6
G-CDN a149a10
GitHub 6cc3ae8
GoCache e3112f7
Google AMP
Google Cloud 33f24e4
Google Project Shield 6e8c38d
Huawei Cloud :x:
Highwinds a149a10
IBM Cloud :x: CDN powered by Akamai.
ICSS :x:
Incapsula 6e8c38d
Instart Logic 6fe4d17
IPFS 6fe4d17 Not a CDN, but a gateway.
jsDelivr :x: Uses StackPath, Cloudflare, Fastly, and Quantil.
KeyCDN 6e8c38d
Kinsta 98ec768 Hosting powered by Google Cloud, CDN powered by KeyCDN.
Leaseweb 6fe4d17
Limelight
Link11 :x:
MaxCDN / StackPath
MyraCloud 98ec768
NetDNA 9e347b6
Netlify 6fe4d17
NetScout :x:
Netskope
OVH
QiHU 6cc3ae8
Qiniu
Quantil e3112f7
section.io a149a10
SingularCDN 6fe4d17
Sucuri 6e8c38d
staticfile Open source CDN for open source libraries. Can detect by URL.
Tor2web ad6de0c Not a CDN, but a gateway.
TransparentCDN 1ad2d8d
Variti 6cc3ae8
Zenedge 6fe4d17


Notes

ghost commented 5 years ago

Block Cloudflare MiTM Attack used to provide options to detect/inform about:

Thus far haven't came any info that AWS or Azure deploying SSL MitM techniques

claustromaniac commented 5 years ago

I'm starting to think that I should make the extension detect as many CDNs as it can, for the following simple reasons:

  1. No matter what their individual practices are or what they want us to believe, reverse proxies pose risks to the users, period. It's a model based on trust, therefore it is flawed by design.
  2. Political neutrality. The extension should focus on providing information, and leave all decision-making to the users. Having a solid stance against Cloudlfare but not against the other CDNs could in some cases mean giving users a false sense of security, which would be counter-productive.

What I'm asking myself now is... how should I go about this? If I were to make this the official philosophy, should I rename this to, say, Detect CDN, and make sure it treats all CDNs the same way?

Can I be arsed to do that? Oh so many questions...


EDIT: Welp. It's done. Detect Cloudflare+ is a thing of the past.

ghost commented 5 years ago

https://www.link11.com/en/cdn/ https://www.link11.com/en/ddos-protection/ -> requires private key for the domain's TLS certificate


https://myracloud.com/en/cdn-content-delivery-network/ https://myracloud.com/en/ddos-protection/ > requires private key for the domain's TLS certificate

:cat2: : Added in v1.1.0


https://www.telekom-icss.com/business-areas/internet-content/cdn-solution https://www.telekom-icss.com/business-areas/internet-content/ddos-defense/backbone-security -> requires private key for the domain's TLS certificate


https://www.netscout.com/arbor-ddos -> requires private key for the domain's TLS certificate

claustromaniac commented 5 years ago

Well now, those will be a nice challenge. I don't see any immediately apparent way to detect them, except for the MyraCloud one. Also, I find it curious that NetScout uses Cloudflare to serve its own website.

It will likely take some time before I can properly research them.

ajvsol commented 5 years ago

Google AMP, CloudFlare AMP and all other AMP caches.

Also consider gateways like Cloudflare's IPFS gateway and Tor2Web's Tor gateway,

:cat2: : Tor2web added in v1.2.0. IPFS added in v1.3.0

claustromaniac commented 5 years ago

This extension already detects requests to the CF IPFS gateway because those always go through their CDN (example). The same goes for CF AMP. I could provide some option to list those requests separately, but I wonder if it's worth it. I'll keep it in mind, though.

Tor2Web can be detected easily and reliably :+1:

As for Google AMP... that one is a bitch. Those folks change their APIs every other week (figuratively speaking). I'll look into it though.

EDIT: Even if I decide not to give the CF IPFS gateway any special treatment, it may be worthwhile to detect connections to the IPFS outside of the CF realm. I think I can do that.

ghost commented 5 years ago

Not sure whether to classify as CDN

https://www.netskope.com/platform

Cloud and web traffic is steered to Netskope for inspection using our patented all-mode traffic steering technology TLS-encrypted cloud traffic is safely decrypted using the Netskope cloud-scale architecture as part of the traffic steering process

netscope

ghost commented 5 years ago
  1. https://www.alibabacloud.com/product/cdn
  2. https://intl.huaweicloud.com/product/cdn.html
  3. https://www.quantil.com
  4. https://www.verizondigitalmedia.com/platform/edgecast-cdn
  5. https://www.ovh.ie/cdn
  6. https://www.jsdelivr.com
  7. https://www.gocache.com.br/en
  8. https://gcorelabs.com/cdn
  9. https://www.cdn77.com
  10. https://cdn.net
  11. https://www.cachefly.com
  12. https://bunnycdn.com
  13. https://www.belugacdn.com
  14. https://www.azion.com.br
  15. https://special.advancedhosting.com/en
  16. https://cloud.baidu.com/product/cdn.html
  17. https://cdn.baomitu.com
  18. https://www.staticfile.org/?ln=en
  19. https://www.bootcdn.cn
ghost commented 5 years ago
16\. https://cloud.baidu.com/product/cdn.html

potential identifiers in the header:

:cat2: : I've seen the headers with ohc prefix in responses by China Telecom that didn't seem necessarily related to Baidu. They may not be reliable.

ghost commented 5 years ago

Azure may feature in the header

Indicates whether the request was proxied through an additional CDN server. For example, a POP server-to-origin shield server or a POP server-to-ADN gateway server. This header is added to the request only when midgress traffic takes place. In this case, the header is set to 1 to indicate that the request was proxied through an additional CDN server.

The debug cache request header provides additional information about the cache policy that is applied to the requested asset. These headers are specific to Azure CDN Premium from Verizon products.

The format through which the Via request header identifies a POP server is specified by the following syntax:

Via: The terms used in the syntax are defined as follows:

Protocol: Indicates the version of the protocol (for example, HTTP/1.1) used to proxy the request.

Platform: Indicates the platform on which the content was requested. The following codes are valid for this field:
Code  Platform
ECAcc     HTTP Large
ECS   HTTP Small
ECD   Application delivery network (ADN)

POP: Indicates the POP that handled the request.

ID: For internal use only.

Example Via request header

Via: HTTP/1.1 ECD (dca/1A2B)

claustromaniac commented 5 years ago

Thanks. That's very helpful.

EDIT: actually, it's not.

Those are request headers, used by the CDN to talk to the origin server. I can't use those to detect Azure.

X-MS-* are the only exceptions in the ones you mentioned. Those are indeed response headers, but they are not reliable in the least. I have found several other azure-specific headers on my own, too, but I'm afraid none of them are reliable. Most of the times I visit a site served by this CDN I don't see any of them present.

ghost commented 5 years ago

IBM Cloud CDN (powered by Akamai) https://www.ibm.com/cloud/cdn

it might be revealed with header x-ibm-trace

ghost commented 5 years ago

@claustromaniac looking at https://www.cdnplanet.com and https://www.cdnoverview.com and https://www.webpagetest.org/ it reveals myriads of CDN.

Researching each and every one would be a gargantuan task and rather unlikely to be achievable.

Thus wondering whether it would not make sense to team up with them (perhaps via an API), e.g. https://github.com/turbobytes/cdnfinder or https://github.com/WPO-Foundation/webpagetest

claustromaniac commented 5 years ago

TL;DR: The main goal of this extension is to raise awareness. If you want more precise information you can always use those tools on your own. Be aware that they are not failsafe either, though.


I was aware of those, and what you suggest is reasonable. To be honest, I already asked myself if I should take that route at some point. However, my decision has always been to stick to my current path for a very simple reason: I don't want to rely on third parties. That's the whole point of this extension.

I can't afford to make queries to third parties on each single request. That would be expensive. Furthermore, the scope of this extension is different than the scope of those web tools. If anything, I'd say those guys are in favor of CDNs. They even use CDNs themselves, which means that I would be making queries to CDNs just to detect other CDNs. See the problem? I wouldn't give up my privacy just to detect CDNs. That doesn't make sense to me. My privacy is the very concern that led me to create this extension in the first place.

That being said, I admit that having hundreds of CDNs to research is not the only trade-off of my current path. Not relying on third parties also means that I have less resources to work with. The extension has very limited information to analyze, which means there are things it cannot do, and it misses and will always miss a number CDNs. Moreover, I don't even want to start using an internal database of IP ranges, because I can barely maintain the extension as it is.

That's the very reason I added heuristics, and I intend to keep improving that feature as much as possible. It won't tell you who is the middle man, but it is a pretty reliable indicator otherwise.

ghost commented 5 years ago

Valid points for sure. What I meant for the API was not realtime pulling from a 3rd party but say in frequent periods pull their available header information and incorporate in this WX, to lighten the load on the research.

I would not mind to lend a hand in expanding the header data on the detection and thus wondering whether the heuristics could be expanded to the browser's developer panel and to highlight in the network tab the row with the heuristic detection and when clicking on such row highlight in the header tab -> the header(s) triggering the heuristics.

Or perhaps an own True View tab in the developer tools, sort of this one in GC

https://chrome.google.com/webstore/detail/is-it-cached/naikbjeckbmjhngcejdmcjhoedhckglk

I could then take a look and report particular headers for specific CDN and thus help in expanding the list of CDN list

claustromaniac commented 5 years ago

I've come up with similar ideas myself but, for now, that wouldn't be practical. Even if you provided me with header data gathered by yourself, I'd still want to analyze it, and my stockpile of data to analyze is large enough as it is. Besides, it may not seem to you like my methods are efficient, but the truth is I've had scarce time to work on this, and so far I have invested a lot of that time coding (not researching). You'll just have to be patient.

I will reconsider the idea sometime in the future, after the extension has matured enough.

I appreciate your continued interest and willingness to help, though. Thanks for that.

Thorin-Oakenpants commented 5 years ago

I am a moron, so please bear with me: why is "github" treated as a CDN, especially given I am on "github"? This confuses me, and I seriously need help (in more ways than you could know)! TIA hubba-hubba

claustromaniac commented 5 years ago

You're not a moron :( You're :jeans:!

The extension treats GitHub as a CDN because it is a CDN can be used as a CDN. It's mostly because of GitHub Pages. IIRC, there are at least 100,000 domains hosted on GitHub.

Some examples for you.

(I can go on and on if I want)...

As for why it shows while you're on GitHub... that's because I didn't bother implementing exceptions, since I don't consider that necessary.

I seriously need help (in more ways than you could know)!

I hope it's not too bad :crying_cat_face:

claustromaniac commented 5 years ago

Let's say it's sort of a special case, but I considered it one worth adding because ... https://whotracks.me/trackers.html (look where github stands in that ranking). It's not like GitHub offers reverse proxy services or the like, if that's what you were wondering. (It does use reverse proxies tho).

Plus... Microsoft.

Thorin-Oakenpants commented 5 years ago

OK, thanks. I totally get the github pages, but to me that is not a CDN (well not if they use the github.io, but I see they can be anything, so yeah). Soz for the spam and stoopid Q's ... I'm a special kind of NEEDS JESUS, but you handled me very well. Tah

claustromaniac commented 5 years ago

Nah, don't say that. Your question was actually good.

As I said before, it's a special case. GitHub is not technically a CDN and AFAIK they don't officially offer such services. There have been ways to take advantage of their infrastructure for similar goals, but that's all. You made me start to think that I should probably move it out of that fieldset in the options page, because lumping it with the other (actual) CDNs doesn't do it justice.

claustromaniac commented 5 years ago

Similarly, I could make the extension detect corporations that threaten our privacy and/or security (even if they don't qualify as CDNs), and list them separately in the options...

ghost commented 5 years ago

I would prefer for Github to remain as CDN as long as it acts one (for 3rd party domains) and thus falling into the category of a CDN (notwithstanding being owned by MS).

To make that distinction (detect corporations that threaten our privacy and/or security) would be helpful for the user but how much extra work/effort to put on your plate?

Loading fonts, libraries, css (and media files) from a CDN is probably less threatening than login credentials or otherwise sensitive personal data being decrypted at the edge server.

That is back to what was discussed earlier - where domain protective services (e.g. firewall or geo location obfuscation with SNI certificates) gets mixed with CDN services and thus blurring the lines.

Thorin-Oakenpants commented 5 years ago

If GitHub is a special case... (if want this in FORTRAN or COBOL, I can do it for you)

IF TLD = github.*
THEN do not list as CDN
ELSE list as CDN

But then if you start to expand to "blurred lines" material, seems like a lot of work. I kinda like the idea of a separate category

claustromaniac commented 5 years ago

Technically, GitHub is a platform for hosting Git repos, and GitHub Pages are meant to be just static pages hosted on GitHub's servers. That's what I mean when I say GitHub is not a CDN. The thing is, anyone can quite easily host static content on GitHub and then load it up from somewhere else. That would be the simplest way I can think of to use GitHub as a CDN.

When it comes to GitHub Pages, that service is different than your typical CDN in that GitHub is merely hosting the sites. We can be (pretty darn) sure GitHub is the one at the other end of the communication (the very end, after all intermediary proxies, etc), which makes this somewhat less potentially risky than caching proxies offered by CDNs.

Still, there are many legit reasons for wanting to detect content served by GitHub, that's why I added it. I'm just wondering if I should move it out of that group of options to avoid giving people the false impression that GitHub offers CDN services.

To make that distinction (detect corporations that threaten our privacy and/or security) would be helpful for the user but how much extra work/effort to put on your plate?

I meant that I could once more broaden the scope of this extension a bit, and have it not only focus on detecting CDNs but also hosting sites (like GitHub) and such. If I did that, I should try to separate the items somehow so it becomes easier for users to understand what the extension is detecting. For starters, I should create a separate category in the options page for sites like GitHub, but I could also add some more information about each service somewhere... I'll have to think this well before I decide what to do.

Loading fonts, libraries, css (and media files) from a CDN is probably less threatening than login credentials or otherwise sensitive personal data being decrypted at the edge server.

Sure, it should be less risky in general, but that's when you think mostly about security. From a privacy standpoint, third-party content served by CDNs is just as potentially dangerous, if not more.

@:jeans:

If I list GitHub in the options as a hosting service or so, seeing it detected here shouldn't be confusing anymore (right?). Would you still prefer to not see it in the popup here? Just to be clear, I can do it like you said, I just don't personally care for it. Besides, not adding such exceptions could be useful: if you're on GitHub and the extension doesn't detect it, it could suggest something weird is going on (like phishing or something else).

Whatcha think?

Thorin-Oakenpants commented 5 years ago

Whatever you do, just be consistent. If GitHub is put in a new category, then treat it the same as others you would put in there re: badge counter etc

Would you still prefer to not see it in the popup here

It's not worth it the extra work, especially as the items in the new category grows. The distinction has already been made by being in a new category! cogito ergo drinkies :beer: night night

PS: would we get a shiny new color (green, glorious green https://en.wikipedia.org/wiki/Money_(Blackadder))

ghost commented 5 years ago

From a privacy standpoint, third-party content served by CDNs is just as potentially dangerous, if not more.

How is the risk to user privacy, as in user tracking/profiling, elevated compared to a domain utilizing user profiling/tracking without a CDN involved? Either party can utilise only the same set of technology available for user tracking/profiling today and privacy conscious users would deploy countermeasures anyway. Such countermeasure protect all the same, whether CDN is in play or not.


When it comes to GitHub Pages, that service is different than your typical CDN in that GitHub is merely hosting the sites. We can be (pretty darn) sure GitHub is the one at the other end of the communication (the very end, after all intermediary proxies, etc), which makes this somewhat less potentially risky than caching proxies offered by CDNs.

That seems inconsistent thus - GitHub is serving content to 3rd part domains and is owned by MS and thus could potentially pose a risk to user privacy (in your own words).

If Github gets excluded from detection, that incl. heuristics, the WX would start loosing its credibility.


The thing is, anyone can quite easily host static content on GitHub and then load it up from somewhere else. That would be the simplest way I can think of to use GitHub as a CDN.

In which case is transforms to a content delivery network for domains unaffiliated to Github.


In Wikipedia CDN is an umbrella term spanning different types of content delivery services: video streaming, software downloads, web and mobile content acceleration, licensed/managed CDN, transparent caching, and services to measure CDN performance, load balancing, multi-CDN switching and analytics and cloud intelligence. CDN vendors may cross over into other industries like security, with DDoS protection and web application firewalls (WAF), and WAN optimization.

ghost commented 5 years ago

loading libraries from CDN may actually pose a security risk unless protected by SRI

claustromaniac commented 5 years ago

How is the risk to user privacy, as in user tracking/profiling, elevated compared to a domain utilizing user profiling/tracking without a CDN involved?

My bad, I misread what you said before. For some weird reason I thought you were comparing third-party vs first-party, while you were actually only comparing the content type. I haven't slept well lately, sorry. :sweat_smile:

If Github gets excluded from detection, that incl. heuristics, the WX would start loosing its credibility.

I never said anything about excluding GitHub. I only said I considered it more appropriate to move GitHub into a separate category in the options page/menu, because GitHub does not (AFAIK) offer CDN services to web developers. If you visit a site and GitHub is detected, you know GitHub is at the other end of the communication. That's the key difference. Services like Cloudflare are used by good and bad people alike, but GitHub is always GitHub.

In Wikipedia CDN is an umbrella term spanning different types of content delivery services

It is indeed a pretty broad term, but GitHub is clearly different than the thirty-something CDNs already detected by TS, and it doesn't even sell itself as a CDN. It's only a hosting service, at least for now. That's why I consider it appropriate to make that distinction somewhere, or at the very least it should be clearly stated somewhere that some hosting services are being thrown in that same bag.

ghost commented 5 years ago

I haven't slept well lately

:worried:

https://www.centurylink.com/business/networking/cdn.html

claustromaniac commented 5 years ago

PS: would we get a shiny new color (green, glorious green https://en.wikipedia.org/wiki/Money_(Blackadder))

Maybe in the next release.