google-area120 / orion-radsec

Apache License 2.0
9 stars 3 forks source link

Issue with Orion Radsecproxy Handling SSL/TLS Decode Errors with Timeout Leading to Server Crashing #5

Closed simeononsecurity closed 3 months ago

simeononsecurity commented 9 months ago

Title: Issue with Orion Radsecproxy Handling EAP Methods

Description: I have encountered an issue while setting up and using the Orion Radsecproxy with Docker. It seems that the proxy is not configured properly to handle EAP methods properly. When attempting to authenticate the server fails and crashes occasionally due to timing out on the decode and an early end of file error.

Steps to Reproduce:

  1. Set up Orion Radsecproxy using the provided Docker setup.
  2. Attempt authentication using EAP methods.

Expected Behavior: The Orion Radsecproxy should handle authentication for a broader range of EAP methods without crashing.

Actual Behavior: The server crashes when attempting authentication with EAP methods.

Additional Information:

Environment:

Note: I have verified that the issue persists with the latest version of the Orion Radsecproxy.

Looking forward to a resolution for this matter.

simeononsecurity commented 9 months ago

https://github.com/google-area120/orion-radsec/pull/3/commits/4531feb0d4a5526e3e71196e33aa115f79dace2c https://github.com/google-area120/orion-radsec/pull/3

simeononsecurity commented 9 months ago

Seems my changes did not fix it. Maybe just slowed it down.

Fri Jan 19 06:47:43 2024 : ERROR: (0) ERROR: (TLS) Alert write:fatal:decode error
Fri Jan 19 06:47:43 2024 : Error: tls: (TLS) Failed in proxy receive: error:0A000126:SSL routines::unexpected eof while reading

and

Sat Jan 20 07:19:15 2024 : ERROR: (0) ERROR: (TLS) Alert write:fatal:decode error
Sat Jan 20 07:19:15 2024 : Error: tls: (TLS) Failed in proxy receive
Sat Jan 20 07:19:15 2024 : Error: tls: (TLS) error:0A000197:SSL routines::shutdown while in init
Sat Jan 20 07:19:15 2024 : Error: tls: (TLS) error:0A000197:SSL routines::shutdown while in init
Sat Jan 20 07:19:15 2024 : Error: tls: (TLS) error:0A000126:SSL routines::unexpected eof while reading

Also tried using libressl instead of openssl as suggested by forum posts I've read. Possibly two separate issues?

simeononsecurity commented 9 months ago

Found this as possibly related https://github.com/radsecproxy/radsecproxy/issues/108

simeononsecurity commented 9 months ago

From reading related forum postings I came to a few answers. Most sounded like a waste of time, but I tried anyways.

RUN apk update && apk add --no-cache \ talloc talloc-dev linux-headers git gcc make \ libc-dev pcre-dev libidn-dev krb5-dev samba-dev \ curl-dev json-c-dev openldap-dev unbound-dev \ ruby-dev perl-dev python3-dev \ hiredis-dev libmemcached-dev gdbm-dev libcouchbase-dev \ postgresql-dev mariadb-dev unixodbc-dev sqlite-dev \ wget tar build-base openssl openssl-dev

RUN mkdir -p /usr/local/src/repositories WORKDIR /usr/local/src/repositories

Download and extract FreeRADIUS source

RUN wget https://github.com/FreeRADIUS/freeradius-server/releases/download/release_3_2_3/freeradius-server-3.2.3.tar.gz \ && tar -xzvf freeradius-server-3.2.3.tar.gz \ && rm freeradius-server-3.2.3.tar.gz

Build and install FreeRADIUS

RUN cd freeradius-server-3.2.3 \ && ./configure --prefix=/opt --with-talloc-lib-dir=/usr/lib/ \ && make \ && make install

RUN rm /opt/lib/*.a && rm /opt/etc/raddb/clients.conf

Stage 2: Create a smaller image with Alpine

FROM alpine:latest

RUN apk update && apk upgrade && \ apk add --upgrade apk-tools && apk upgrade --available && \ apk add -q --no-cache bash openssl talloc libressl pcre libwbclient tzdata \ && rm -rf /tmp/ /var/cache/apk/

COPY --from=builder /opt /opt

RUN ln -s /opt/etc/raddb /etc/raddb

COPY --chown=radius:radius radiusd.conf /opt/etc/raddb/radiusd.conf COPY --chown=radius:radius cacerts/ /opt/etc/raddb/cacerts

Create "radius" group with GID 101

RUN addgroup -g 101 radius

Create "radius" user with UID 101

RUN adduser -D -u 101 -G radius radius

EXPOSE 1812:1812/udp 1812:1813/udp

CMD /bin/bash -c "PATH=/opt/sbin:/opt/bin:$PATH && export PATH && while true; do radiusd -f -lstdout; sleep 1; done"



All of these options were a waste unfortunately. They didn't solve the server crashing issue.
ahenson-google commented 9 months ago

Thanks for reporting this. We have partners using the proxy successfully with SIM-based auth, so this may be more environment-specific. If possible, please log and share the output of radiusd -X on your proxy during a crash.

simeononsecurity commented 9 months ago

Thanks for reporting this. We have partners using the proxy successfully with SIM-based auth, so this may be more environment-specific. If possible, please log and share the output of radiusd -X on your proxy during a crash.

I'd love to. Can I share it with you all privately? There are unique identifiers in there that I don't think this is the best place to publish them in. I'll attempt to mask those details otherwise.

I have to say, it is likely something to do with the environment. It's possible one of my devices is running an unpatched hostapd. But unfortunately I have no way to confirm that at this moment. Also the title may no longer be accurate. I've seen the error on almost every type of auth now. It happens randomly and I'm unable to recreate, I just have to wait for it to happen.

simeononsecurity commented 9 months ago

Haven't heard response @ahenson-google So here is the anonymized and abridged version of the logs. I should note, that I've configured mine in this instance to timeout only after 30 seconds, the higher than what is set in the repo as it seemingly helps with this problem. Additionally, I've configured my container to auto restart after crashes, which the current repo is not configured to do. Otherwise it would crash and not restart unless I specifically did it manually.

(4) Proxying request to home server XXXXXXXXXXXXXXXX port 2083 (TLS) timeout 30.000000
(4) Sent Accounting-Request Id 122 from XXXXXXXXXXXXXXXX to XXXXXXXXXXXXXXXX length 476
(4)   Acct-Status-Type = Interim-Update
(4)   Acct-Authentic = RADIUS
(4)   User-Name = "XXXXXXXXXXXXXXXX"
(4)   NAS-Identifier = "XXXXXXXXXXXXXXXX"
(4)   Called-Station-Id = "XXXXXXXXXXXXXXXX"
(4)   NAS-Port-Type = Wireless-802.11
(4)   Service-Type = Framed-User
(4)   NAS-Port = 1
(4)   Calling-Station-Id = "XXXXXXXXXXXXXXXX"
(4)   Connect-Info = "CONNECT 54Mbps 802.11a"
(4)   Acct-Session-Id = "XXXXXXXXXXXXXXXX"
(4)   WLAN-Pairwise-Cipher = 1027076
(4)   WLAN-Group-Cipher = 1027076
(4)   WLAN-AKM-Suite = 1027077
(4)   WLAN-Group-Mgmt-Cipher = 1027078
(4)   Class = XXXXXXXXXXXXXXXX
(4)   Chargeable-User-Identity = XXXXXXXXXXXXXXXX
(4)   Framed-IP-Address = XXXXXXXXXXXXXXXX
(4)   Event-Timestamp = "Jan 25 2024 00:43:XX UTC"
(4)   Acct-Delay-Time = 0
(4)   Acct-Session-Time = 12900
(4)   Acct-Input-Packets = 7234
(4)   Acct-Output-Packets = 4055
(4)   Acct-Input-Octets = 2213238
(4)   Acct-Input-Gigawords = 0
(4)   Acct-Output-Octets = 1534810
(4)   Acct-Output-Gigawords = 0
(4)   Proxy-State = 0x313531
Waking up in 0.3 seconds.
(4) Clearing existing &reply: attributes
(4) Received Accounting-Response Id 122 from XXXXXXXXXXXXXXXX to XXXXXXXXXXXXXXXX length 25
(4)   Proxy-State = 0x313531
(4) server default {
(4) }
(4) Sent Accounting-Response Id 151 from XXXXXXXXXXXXXXXX to XXXXXXXXXXXXXXXX length 20
(4) Finished request
(4) Cleaning up request packet ID 151 with timestamp +1238 due to done
Ready to process requests
(0) (TLS) send TLS 1.2 Alert, fatal decode_error
(0) ERROR: (TLS) Alert write:fatal:decode error
tls: (TLS) Failed in proxy receive: error:0A000126:SSL routines::unexpected eof while reading
Closing TLS socket to home server
(TLS) Client has closed connection
 ... shutting down socket proxy (XXXXXXXXXXXXXXXX, 48023) -> home_server (XXXXXXXXXXXXXXXX, 2083) (1 of 32)
Ready to process requests
simeononsecurity commented 3 months ago

After a while of experimenting and reading into the freeradius codebase I think discovered how to fix this.

There are a few things that needed to happen to make it go away entirely.

  1. Set TLS conf to disable the ecdh curve with a min version of tls 1.2 and max of 1.3. These configurations where suggested in https://github.com/hgot07/openroaming-config/blob/main/OR-config-FreeRADIUS-20240409.pdf by hgot07 and I found them to resolve a few issues I was experiencing.

        tls {
                ecdh_curve = ""
                tls_min_version = "1.2"
                tls_max_version = "1.3"
    
        }
  2. Updating to freeradius 3.2.x, which solved many tls related issues. This requires either compiling 3.2.x for alpine with a docker multi stage build or switching to an ubuntu based image. For my testing, I opted to switch to ubuntu 24.04

  3. Enabling the nonblock option (requires 3.2.x) under the home server configuration sections

    home_server g1 {
        type = auth+acct
        ipaddr = xxx.xxx.xxx.xxx
        port = 2083
        proto = tcp
        secret = radsec
        nonblock = yes
    }

    All of these changes may come with unexpected other issues. But for my environment I was able to resolve the crashing and improve the stability significantly.