apache / netbeans

Apache NetBeans
https://netbeans.apache.org/
Apache License 2.0
2.66k stars 852 forks source link

Building with proxies causes DoS attack on OSU #4890

Open jgneff opened 2 years ago

jgneff commented 2 years ago

Apache NetBeans version

Apache NetBeans 16 release candidate

What happened

Building NetBeans in an environment that defines both proxy variables causes a brief denial-of-service (DoS) attack on one of the Web servers hosted by the Oregon State University (OSU) Open Source Lab. Below is a typical example of the variables being defined and exported to the environment:

export http_proxy=http://10.10.10.1:8222/
export https_proxy=http://10.10.10.1:8222/

The same build also causes an attack on the Sonatype Maven Central Repository, although the files in Maven Central are hosted by the Fastly Content Delivery Network (CDN), allowing it to continue serving its content regardless.

The attack is a timeout-exploiting connection flood, which works by establish pending connections with the target server. It is similar to the Slowloris attack, but less effective because it doesn't avoid timeouts through the use of partial requests. Instead, the NetBeans build opens hundreds of connections to the target Web servers and never sends any request headers at all. The connections are closed only when they time out on the server side or when the process that created them terminates on the build side.

Specifically, the build opens 469 unused connections to repo1.maven.org (an alias for sonatype.map.fastly.net), sends no request headers, and leaves them open. The connections eventually time out on the server side, but that can take up to 20 seconds. This attack is unsuccessful in my experience, likely due to the Fastly CDN.

The build opens only 95 unused connections to netbeans.osuosl.org (an alias for ftp.osuosl.org). Because the Open Source Lab hosts its files directly, though, the attack is usually successful in exhausting all request handlers. The Web server then returns an HTTP response status code of 503, which terminates the build. Even when the Web server is able to handle the load and the build is successful, it can take up to 20 seconds for the superfluous connections to time out on the server side.

An unaware developer can assume that the build failure is a transient error and repeat the build until it's successful, as I did before uncovering the source of the problem. That has the unfortunate effect of turning a brief, one-time attack into a dozen or more repeated attacks throughout the day.

Furthermore, the build attempts to make 564 direct connections to the remote Web servers even when a proxy server is defined and working, which results in a waste of resources on the build machine itself. The build also creates more than 1,692 unnecessary operating system threads, even when no proxy servers are defined, and leaves them waiting in the system until the build completes.

How to reproduce

There are three ways to reproduce the problem:

  1. run a remote build on Launchpad,
  2. run my netbeans-proxies stand-alone program, or
  3. run the download-all-extbins target of the NetBeans build.

The second and third methods require setting up a local firewall and proxy server.

Launchpad

One way to reproduce the problem is to run a remote build on Launchpad, which has a strict firewall and permits outbound connections only through its proxy server. Launchpad runs the build in an LXD container as follows:

$ lxc exec lp-bionic-amd64 \
    --env LANG=C.UTF-8 \
    --env SHELL=/bin/sh \
    --env http_proxy=http://10.10.10.1:8222/ \
    --env https_proxy=http://10.10.10.1:8222/ \
    --env GIT_PROXY_COMMAND=/usr/local/bin/lpbuildd-git-proxy \
    ...

That's how I discovered the problem, but this method provides no diagnostic information other than the 503 response code when the build fails. To find out what's really going on, you need to reproduce it locally.

netbeans-proxies

I wrote a simple program, called netbeans-proxies, that safely illustrates the problem without creating a burden on the target server. The program downloads just 14 kilobytes in five files, whereas a clean build of NetBeans downloads at least 754 megabytes in 564 files.

The program makes it easy to run, test, and debug the NetBeans build task and even step through its code one statement at a time. See the GitHub repository jgneff/netbeans-proxies for details on setting up its environment and running the tests.

download-all-extbins

To reproduce the problem using the actual NetBeans build, run the build on the same system that you set up for the netbeans-proxies program above. Disable the firewall long enough to clone the NetBeans repository:

$ sudo ufw disable
$ git clone https://github.com/apache/netbeans.git
$ sudo ufw enable

Then double check that the firewall is active, save a backup of the original repository, set the proxy environment variables, and run just the downloading task as follows:

$ sudo ufw status
$ rsync -av netbeans/ netbeans-original/
$ . ~/bin/proxy.env
$ cd netbeans
$ git switch release160
$ ant -quiet -Dmetabuild.branch=release160 download-all-extbins

Before running the build a second time, you'll need to remove the cached files and start with a fresh copy of the repository, thereby removing the files that were downloaded into its subdirectories. For example:

$ rm -r ~/.hgexternalcache
$ rsync -av --delete ../netbeans-original/ ./
$ git switch release160
$ ant -quiet -Dmetabuild.branch=release160 download-all-extbins

If the build fails, you'll see an error message like the following:

java.io.IOException: Skipping download from https://netbeans.osuosl.org/binaries/
    89BC047153217F5254506F4C622A771A78883CBC-ValidationAPI-
    b26b94cc001a41ab9138496b11e2ae256a159ffd.jar due to response code 503

BUILD FAILED

Did this work correctly in an earlier version?

No / Don't know

Operating System

Ubuntu 20.04.5 LTS (Focal Fossa)

JDK

OpenJDK Runtime Environment (build 11.0.16+8-post-Ubuntu-0ubuntu120.04)

Apache NetBeans packaging

Own source build

Anything else

Although the DoS attacks occur with every build when both proxy variables are defined, the build itself fails for me in about three out of every four runs.

Web servers

The NetBeans build downloads:

Whether the build works or fails may depend partly on which address you receive for the Open Source Lab Web server:

$ host netbeans.osuosl.org
netbeans.osuosl.org is an alias for ftp.osuosl.org.
ftp.osuosl.org has address 140.211.166.134
ftp.osuosl.org has address 64.50.233.100
ftp.osuosl.org has address 64.50.236.52
ftp.osuosl.org has IPv6 address 2605:bc80:3010::134
ftp.osuosl.org has IPv6 address 2600:3402:200:227::2
ftp.osuosl.org has IPv6 address 2600:3404:200:237::2

One set of addresses is owned by the University of Oregon in Eugene, Oregon, while the other two sets of addresses are owned by TDS TELECOM in Madison, Wisconsin.

The Maven Central Repository is hosted behind the Fastly CDN, which seems capable of handling the connection flood:

$ host repo1.maven.org
repo1.maven.org is an alias for sonatype.map.fastly.net.
sonatype.map.fastly.net has address 199.232.192.209
sonatype.map.fastly.net has address 199.232.196.209

Workaround

There is a partial workaround for the problem: simply unset one of the proxy environment variables, like so:

$ unset https_proxy

Even with this workaround, though, the build still tries to make hundreds of direct connections to the remote Web servers, but those are presumably blocked by the firewall.

Access logs

Below are the Squid access log files that I recorded from three full builds of NetBeans:

  1. access-bypass.log - bypassed the bug with the workaround
  2. access-failed.log - failed due to response code 503
  3. access-worked.log - worked despite the connection flood

The complete log files are included below. The hundreds of superfluous connections can be identified by those to netbeans.osuosl.org that transferred only 176 bytes and those to repo1.maven.org that transferred only 180 bytes. There are other unused connections in the log files, but those are the easiest to identify.

The exchange on the unused connections starts with an outgoing request to the proxy server:

CONNECT netbeans.osuosl.org:443 HTTP/1.1
User-Agent: Java/11.0.16
Host: netbeans.osuosl.org
Accept: text/html, image/gif, image/jpeg, *; q=.2, */*; q=.2
Proxy-Connection: keep-alive

followed by the response from the proxy server:

HTTP/1.1 200 Connection established

followed by what appears to be an abbreviated TLS handshake, after which the connection is idle until it's closed by the remote Web server due to a timeout.

The first log file can be included inline, but the other two are too big for an issue comment and must be included as an attachment.

access-bypass.log

23 connections, all of them good:

1666216766.882   1579 10.203.206.244 TCP_TUNNEL/200 36662 CONNECT gitbox.apache.org:443 - HIER_DIRECT/65.108.73.173 -
1666216767.468   2604 10.203.206.244 TCP_TUNNEL/200 5775 CONNECT gitbox.apache.org:443 - HIER_DIRECT/65.108.73.173 -
1666216767.654   6024 10.203.206.244 TCP_TUNNEL/200 570407 CONNECT netbeans.osuosl.org:443 - HIER_DIRECT/64.50.233.100 -
1666216801.004   6986 10.203.206.244 TCP_TUNNEL/200 227530 CONNECT netbeans.osuosl.org:443 - HIER_DIRECT/64.50.233.100 -
1666216812.906  60610 10.203.206.244 TCP_TUNNEL/200 1601172 CONNECT repo1.maven.org:443 - HIER_DIRECT/199.232.192.209 -
1666216823.385     93 10.203.206.244 TCP_TUNNEL/200 3038 CONNECT services.gradle.org:443 - HIER_DIRECT/104.18.191.9 -
1666216862.403   5357 10.203.206.244 TCP_TUNNEL/200 107906 CONNECT netbeans.osuosl.org:443 - HIER_DIRECT/64.50.236.52 -
1666216917.039  21523 10.203.206.244 TCP_TUNNEL/200 15861152 CONNECT netbeans.osuosl.org:443 - HIER_DIRECT/64.50.236.52 -
1666216919.684  63779 10.203.206.244 TCP_TUNNEL/200 2443597 CONNECT repo.gradle.org:443 - HIER_DIRECT/104.18.191.9 -
1666216919.686  96300 10.203.206.244 TCP_TUNNEL/200 116001697 CONNECT downloads.gradle-dn.com:443 - HIER_DIRECT/104.18.165.99 -
1666216939.136  16733 10.203.206.244 TCP_TUNNEL/200 14409257 CONNECT netbeans.osuosl.org:443 - HIER_DIRECT/64.50.233.100 -
1666216955.328  15259 10.203.206.244 TCP_TUNNEL/200 3733155 CONNECT netbeans.osuosl.org:443 - HIER_DIRECT/64.50.233.100 -
1666216964.251   5648 10.203.206.244 TCP_TUNNEL/200 348769 CONNECT netbeans.osuosl.org:443 - HIER_DIRECT/64.50.233.100 -
1666216982.681   7515 10.203.206.244 TCP_TUNNEL/200 1727695 CONNECT netbeans.osuosl.org:443 - HIER_DIRECT/64.50.233.100 -
1666216995.907  11511 10.203.206.244 TCP_TUNNEL/200 4737625 CONNECT netbeans.osuosl.org:443 - HIER_DIRECT/2600:3404:200:237::2 -
1666217019.945  18265 10.203.206.244 TCP_TUNNEL/200 10818236 CONNECT netbeans.osuosl.org:443 - HIER_DIRECT/2600:3404:200:237::2 -
1666217063.608 310645 10.203.206.244 TCP_TUNNEL/200 572986695 CONNECT repo1.maven.org:443 - HIER_DIRECT/199.232.192.209 -
1666217126.510  62896 10.203.206.244 TCP_TUNNEL/200 123171 CONNECT repo1.maven.org:443 - HIER_DIRECT/199.232.196.209 -
1666217447.971    208 10.203.206.244 TCP_MISS/200 9384 GET http://www.w3.org/2001/xml.xsd - HIER_DIRECT/128.30.52.100 application/xml
1666217448.180    101 10.203.206.244 TCP_MISS/200 9384 GET http://www.w3.org/2001/xml.xsd - HIER_DIRECT/128.30.52.100 application/xml
1666217449.280    204 10.203.206.244 TCP_MISS/200 9384 GET http://www.w3.org/2001/xml.xsd - HIER_DIRECT/128.30.52.100 application/xml
1666217449.709    103 10.203.206.244 TCP_MISS/200 9384 GET http://www.w3.org/2001/xml.xsd - HIER_DIRECT/128.30.52.100 application/xml
1666217450.058    103 10.203.206.244 TCP_MISS/200 9384 GET http://www.w3.org/2001/xml.xsd - HIER_DIRECT/128.30.52.100 application/xml

access-failed.log

566 connections, at least 543 of them unused:

access-failed.log

access-worked.log

573 connections, at least 541 of them unused:

access-worked.log

Are you willing to submit a pull request?

Yes

Code of Conduct

Yes

mbien commented 2 years ago

I noticed some OSU server connection timeouts this weekend, e.g https://github.com/apache/netbeans/actions/runs/3349423376/jobs/5549411354

couldn't find the cause but it lead me to a bug in how our workflow invalidates the cache #4886. I supposed you tested your setup this weekend? :)

jgneff commented 2 years ago

java.io.IOException: Could not connect to https://netbeans.osuosl.org/binaries/4B4DCA62F8C4A1954AE6D286955C36CC50B8CC3A-exechlp-1.2.zip within 15000 milliseconds

I supposed you tested your setup this weekend? :)

That could have been me, based on the timestamp (Fri 28 Oct 2022 05:41:42 PM PDT). That's the part I don't know: whether the OSU Web servers mitigate such attacks only through a RequestReadTimeout directive, or do they also limit the maximum number of connections from a single IP address. See the section "How is a Slowloris attack mitigated?" on the Cloudflare page about Slowloris.

If they do both, then the DoS attack is really just self-inflicted with a limited impact on other users of the Web server. I suspect, though, that they're using only the request-read header timeout, which means other users could encounter problems, too.

mbien commented 2 years ago

the way it works here is that we download everything into a cache. I think most CI builds shouldn't ping the server at all (assuming everything works as expected) since the cache is shared. Local builds work the same. Devs would only download the libs during the first build, subsequent builds only download the delta if there is any. (edit: basically like maven)

jgneff commented 2 years ago

I think most CI builds shouldn't ping the server at all (assuming everything works as expected) since the cache is shared.

Right. I hit this bug because the Launchpad build farm runs each build in a transient container created from trusted images to ensure a clean and isolated build environment. It starts every build entirely from scratch.

ramereth commented 2 years ago

Hi, I'm from the OSUOSL as mentioned on this issue. I see that @jgneff created a ticket on our support system which referenced this issue. I wanted to give some background on how your mirrors are setup in case that impacts how you fix this.

  1. We use mod_limitipconn with MaxConnPerIP 20 set
  2. We also have the reqtimeout_module enabled with the following settings:
    RequestReadTimeout header=20-40,minrate=500
    RequestReadTimeout body=10,minrate=500

Hopefully this helps you out! Let me know if you need anything else from us.

ramereth commented 2 years ago

I noticed some OSU server connection timeouts this weekend, e.g https://github.com/apache/netbeans/actions/runs/3349423376/jobs/5549411354

couldn't find the cause but it lead me to a bug in how our workflow invalidates the cache #4886. I supposed you tested your setup this weekend? :)

FWIW this seems to line up with a DDoS we were having at the time against our DNS servers (which also happened this morning unfortunately).

jgneff commented 2 years ago

@ramereth Thank you for commenting, Lance. That answers some of my lingering questions.

We use mod_limitipconn with MaxConnPerIP 20 set

That answers my previous comment, and indicates this really is just a self-inflicted denial-of-service attack affecting only the person running the build (me!). It also explains why the server is responding with status code 503. The README file of mod_limitipconn states:

  1. Connections in excess of the limit result in a stock 503 Service Temporarily Unavailable response. The job of returning a more useful error message to the client is left as an exercise for the reader.

The NetBeans build makes over 95 connections through the Squid proxy server to netbeans.osuosl.org, so I'm surprised that it sometimes works. Perhaps Squid is multiplexing those onto a smaller set of forwarding connections, or holding off on connecting until it receives a request header. I'll look into it.

RequestReadTimeout header=20-40,minrate=500

That confirms the 20-second timeout I'm seeing on the unused connections which send no request headers.

ramereth commented 2 years ago

The NetBeans build makes over 95 connections through the Squid proxy server to netbeans.osuosl.org, so I'm surprised that it sometimes works. Perhaps Squid is multiplexing those onto a smaller set of forwarding connections, or holding off on connecting until it receives a request header. I'll look into it.

@jgneff this might be due to the fact you might be hitting one of the other two servers that are in DNS rotation. If you haven't hit those as much, it likely would continue working.

If you'd like us to make any changes on our end to make this better, please let me know. I'm certainly willing to make a change if it makes sense and doesn't impact our service.

jgneff commented 2 years ago

The NetBeans build makes over 95 connections through the Squid proxy server to netbeans.osuosl.org, so I'm surprised that it sometimes works. ... I'll look into it.

Here's what I found. The NetBeans build avoids the connection limit of mod_limitipconn by not sending any request headers at all. That module hooks in too late in the request processing phase to enforce its limit on such unused, idle connections. Looking at the source code:

My experiments confirm this. I can make 100 connections to netbeans.osuosl.org through the proxy server, perform the TLS handshake, and as long as no request headers are sent, they'll just sit there for 20 seconds until they are closed by mod_reqtimeout. If, on the other hand, I make 40 connections and send the request headers immediately, only 8 of them are successful. The other 32 return "503 Service Unavailable" due to the per-IP connection limit of mod_limitipconn.

There's an experimental mod_noloris from Apache that looks interesting and appears to enforce the per-IP connection limit earlier. See also mod_antiloris and a good write-up called "Slowloris And Mitigations For Apache".

ramereth commented 1 year ago

@jgneff how would you like to proceed on this?

jgneff commented 1 year ago

@ramereth Thanks for asking. I have not found any problems that would require a change in the Web server on your end of the connection. On the contrary, when testing with a fix or a workaround, I have yet to encounter any errors with netbeans.osuosl.org at all. So thank you and your team for such a reliable archive!

I have been working on a more general fix for the past couple of weeks. I plan to submit it as a new pull request and close the current one. So far in my testing, the fix is working well and makes a predictable set of just 10 connections to netbeans.osuosl.org and two connections to repo1.maven.org while downloading all of the external binaries.

matthiasblaesing commented 1 year ago

I removed this issue from NB17 milestone. The linked PR #4206 explicitly requests not to be merged yet.