dfskoll / rp-pppoe

Public repository for RP-PPPoE PPPoE client and server software
https://dianne.skoll.ca/projects/rp-pppoe/
54 stars 18 forks source link

session() doesn't handle discovery packets anymore #23

Closed skarnet closed 1 year ago

skarnet commented 1 year ago

Hi, Trying to upgrade to the latest rp-pppoe (from the 3.12 release!). Using userspace pppoe with the latest pppd. pppoe is invoked by pppd as pppoe -I eth2 -m 1412, nothing fancy, the interface exists, etc.

pppoe busyloops around the select() call: strace shows the following line endlessly repeated:

select(5, [0 3 4], NULL, NULL, NULL)    = 1 (in [4])

Looking at the code, it appears that the culprit is the session() function. fd 4 is conn→discoverySocket, which triggers... but there is no handling for FD_ISSET(conn→discoverySocket, &readable)! So the function loops with an event on the discovery socket that is never read or handled.

By comparison, rp-pppoe 3.12 has the following code snippet to handle that case:

#ifndef USE_BPF
        /* BSD uses a single socket, see *syncReadFromEth() */
        /* for calls to sessionDiscoveryPacket() */
        if (conn->discoverySocket >= 0) {
            if (FD_ISSET(conn->discoverySocket, &readable)) {
                sessionDiscoveryPacket(conn);
            }
        }
#endif

(and I'm not sure what's supposed to happen if USE_BPF is defined, but I don't use BPF so it works for me.)

So, is there anything I'm supposed to do with that discovery socket with the current rp-pppoe git? Or is the code just missing the event handling?

Thanks!

jkroonza commented 1 year ago

Hi @skarnet

May I inquire as to why you're using the userspace implementation and not the kernel-mode implementation?

Incidentally, personally I would use iptables for adjusting the MTU rather than pppoe (in the FORWARD chain: -p tcp --syn -j TCPMSS --clamp-mss-to-pmtu).

Once you get to session() discovery should be completed, so I'm not convinced of what's going on here? The discovery() function handles initial discovery, and complete the discover prior to entering into session().

What may be missing is handling for PADT frames, but USUALLY the ppp connection itself should also terminate PRIOR to PADT so whilst handling SHOULD be implemented if it's not I don't think this is what's causing your problems (in fact, you'd probably fail to notice this).

Perhaps you could give us a bit more details on the symptoms? Also, the relevant logs from syslog, and perhaps the debug log if passing -D to pppoe itself (via pppd).

skarnet commented 1 year ago

I'm using the userspace implementation because I'm running a machine with only static binaries, so dynamic plugin loading is out.

When bringing up the service: debug information gives me this, which looks very normal (I can give the full packets if needed, but I don't think that will help):

rp-pppoe-3.15
10:54:33.037 SENT PPPoE Discovery (8863) PADI sess-id 0 length 4
SourceAddr 00:00:24:d2:21:2a DestAddr ff:ff:ff:ff:ff:ff
01 01 00 00                                       ....

10:54:33.056 RCVD PPPoE Discovery (8863) PADO sess-id 0 length 107
SourceAddr 70:70:8b:09:9c:80 DestAddr 00:00:24:d2:21:2a
(...)

10:54:33.056 SENT PPPoE Discovery (8863) PADR sess-id 0 length 24
SourceAddr 00:00:24:d2:21:2a DestAddr 70:70:8b:09:9c:80
01 01 00 00 01 04 00 10 80 6c f4 52 85 ed 3b 36   .........l.R..;6
67 3a fa f9 66 2c 0f ad                           g:..f,..

10:54:33.075 RCVD PPPoE Discovery (8863) PADS sess-id 32464 length 88
SourceAddr 70:70:8b:09:9c:80 DestAddr 00:00:24:d2:21:2a
(...)

10:54:34.937 SENT PPPoE Session (8864) SESS sess-id 32464 length 24
SourceAddr 00:00:24:d2:21:2a DestAddr 70:70:8b:09:9c:80
(...)

10:54:34.952 RCVD PPPoE Session (8864) SESS sess-id 32464 length 21
SourceAddr 70:70:8b:09:9c:80 DestAddr 00:00:24:d2:21:2a
(...)

and then a number of SESS packet exchanges, ending in

11:00:24.768 SENT PPPoE Session (8864) SESS sess-id 32464 length 10
SourceAddr 00:00:24:d2:21:2a DestAddr 70:70:8b:09:9c:80
c0 21 0a c1 00 08 75 3d 32 93                     .!....u=2.

The relevant syslog excerpt is the following. Again, it looks very normal.

2023-01-31 09:54:05.929766352 daemon.notice: Jan 31 09:54:32 pppd[15185]: pppd 2.4.9 started by root, uid 0
2023-01-31 09:54:05.931442263 daemon.debug: Jan 31 09:54:32 pppd[15185]: using channel 5
2023-01-31 09:54:05.933236626 daemon.info: Jan 31 09:54:32 pppd[15185]: Using interface ppp0
2023-01-31 09:54:05.934579394 daemon.notice: Jan 31 09:54:32 pppd[15185]: Connect: ppp0 <--> /dev/pts/0
2023-01-31 09:54:06.075421815 daemon.debug: Jan 31 09:54:33 pppoe[15192]: PADS: Service-Name: ''
2023-01-31 09:54:06.075433060 daemon.info: Jan 31 09:54:33 pppoe[15192]: PPP session is 32464 (0x7ed0)
2023-01-31 09:54:07.936967687 daemon.debug: Jan 31 09:54:34 pppd[15185]: sent [LCP ConfReq id=0x1 <mru 1492> <asyncmap 0x0> <magic 0x78c94c9c> <accomp>]
2023-01-31 09:54:07.952637277 daemon.debug: Jan 31 09:54:34 pppd[15185]: rcvd [LCP ConfReq id=0x1 <mru 1460> <auth chap MD5> <magic 0x818b17b>]
2023-01-31 09:54:07.953794477 daemon.debug: Jan 31 09:54:34 pppd[15185]: sent [LCP ConfAck id=0x1 <mru 1460> <auth chap MD5> <magic 0x818b17b>]
2023-01-31 09:54:07.954891823 daemon.debug: Jan 31 09:54:34 pppd[15185]: rcvd [LCP ConfAck id=0x1 <mru 1492> <asyncmap 0x0> <magic 0x78c94c9c> <accomp>]
2023-01-31 09:54:07.970499046 daemon.debug: Jan 31 09:54:34 pppd[15185]: rcvd [CHAP Challenge id=0x1 <a5b1e7401dfe85ca662f405bbc256187>, name = "*redacted*"]
2023-01-31 09:54:07.971345592 daemon.debug: Jan 31 09:54:34 pppd[15185]: sent [CHAP Response id=0x1 <003ebb303922d5f99a6945c6898dfb18>, name = "*redacted*"]
2023-01-31 09:54:08.021952237 daemon.debug: Jan 31 09:54:35 pppd[15185]: rcvd [LCP ConfReq id=0x1 <mru 1460> <auth pap> <magic 0x818b17b> <mrru 1614> <endpoint [IP:*redacted*]>]
2023-01-31 09:54:08.023228097 daemon.debug: Jan 31 09:54:35 pppd[15185]: sent [LCP ConfReq id=0x2 <mru 1492> <asyncmap 0x0> <magic 0x753d3293> <accomp>]
2023-01-31 09:54:08.024322999 daemon.debug: Jan 31 09:54:35 pppd[15185]: sent [LCP ConfRej id=0x1 <mrru 1614>]
2023-01-31 09:54:08.038962221 daemon.debug: Jan 31 09:54:35 pppd[15185]: rcvd [LCP ConfAck id=0x2 <mru 1492> <asyncmap 0x0> <magic 0x753d3293> <accomp>]
2023-01-31 09:54:08.040166843 daemon.debug: Jan 31 09:54:35 pppd[15185]: rcvd [LCP ConfReq id=0x2 <mru 1460> <auth pap> <magic 0x818b17b> <endpoint [IP:*redacted*]>]
2023-01-31 09:54:08.041168926 daemon.debug: Jan 31 09:54:35 pppd[15185]: sent [LCP ConfNak id=0x2 <auth chap MD5>]
2023-01-31 09:54:08.056266656 daemon.debug: Jan 31 09:54:35 pppd[15185]: rcvd [LCP ConfReq id=0x3 <mru 1460> <auth chap MD5> <magic 0x818b17b> <endpoint [IP:*redacted*]>]
2023-01-31 09:54:08.057375736 daemon.debug: Jan 31 09:54:35 pppd[15185]: sent [LCP ConfAck id=0x3 <mru 1460> <auth chap MD5> <magic 0x818b17b> <endpoint [IP:*redacted*]>]
2023-01-31 09:54:08.073021999 daemon.debug: Jan 31 09:54:35 pppd[15185]: rcvd [CHAP Challenge id=0x1 <3d2f01109a6836ab186469b850c8581f>, name = "*redacted*"]
2023-01-31 09:54:08.074226831 daemon.debug: Jan 31 09:54:35 pppd[15185]: sent [CHAP Response id=0x1 <ed44afb78e357a67d766648da49c1bdb>, name = "*redacted*"]
2023-01-31 09:54:08.090778866 daemon.debug: Jan 31 09:54:35 pppd[15185]: rcvd [CHAP Success id=0x1 ""]
2023-01-31 09:54:08.091794358 daemon.info: Jan 31 09:54:35 pppd[15185]: CHAP authentication succeeded
2023-01-31 09:54:08.093035927 daemon.notice: Jan 31 09:54:35 pppd[15185]: CHAP authentication succeeded
2023-01-31 09:54:08.094418156 daemon.debug: Jan 31 09:54:35 pppd[15185]: sent [CCP ConfReq id=0x1 <deflate 15> <deflate(old#) 15>]
2023-01-31 09:54:08.095560200 daemon.debug: Jan 31 09:54:35 pppd[15185]: sent [IPCP ConfReq id=0x1 <compress VJ 0f 01> <addr 0.0.0.0>]

followed by IPCP configuration messages, and normal PPP scripts. All in all, the whole PPP setup works, functionally there's no problem; if it weren't for CPU usage, there would be no unusual symptoms.

But, as I reported, the pppoe binary is busylooping around select(), eating up 100% CPU time, because the discovery socket keeps being readable and is never handled. If the discovery is supposed to be finished before the session() function is called, then the code is failing to close conn→discoverySocket and set it to -1 again once the discovery is over.

dfskoll commented 1 year ago

Hi,

Oops. That's a terrible bug. I was cleaning up the code and accidentally removed important code. I will fix later today.

Regards,

Dianne.

On January 31, 2023 3:28:56 AM EST, Laurent Bercot @.***> wrote:

Hi,

Trying to upgrade to the latest rp-pppoe (from the 3.12 release!). Using userspace pppoe with the latest pppd. pppoe is invoked by pppd as pppoe -I eth2 -m 1412, nothing fancy, the interface exists, etc.

pppoe busyloops around the select() call: strace shows the following line endlessly repeated:


select(5, [0 3 4], NULL, NULL, NULL)    = 1 (in [4])

Looking at the code, it appears that the culprit is the session() function. fd 4 is conn→discoverySocket, which triggers... but there is no handling for FD_ISSET(conn→discoverySocket, &readable)! So the function loops with an event on the discovery socket that is never read or handled.

By comparison, rp-pppoe 3.12 has the following code snippet to handle that case:


#ifndef USE_BPF

       /* BSD uses a single socket, see *syncReadFromEth() */

       /* for calls to sessionDiscoveryPacket() */

       if (conn->discoverySocket >= 0) {

           if (FD_ISSET(conn->discoverySocket, &readable)) {

               sessionDiscoveryPacket(conn);

           }

       }

#endif

(and I'm not sure what's supposed to happen if USE_BPF is defined, but I don't use BPF so it works for me.)

So, is there anything I'm supposed to do with that discovery socket with the current rp-pppoe git? Or is the code just missing the event handling?

Thanks!

-- > Reply to this email directly or view it on GitHub:

https://github.com/dfskoll/rp-pppoe/issues/23

You are receiving this because you are subscribed to this thread.

Message ID: @.***>

dfskoll commented 1 year ago

Hi,

I believe this is fixed in the latest commit (a2bebb2a359d28d0f28d25ec2b348e7d43b519a6)

Could you test, please?

Regards,

Dianne.

skarnet commented 1 year ago

It appears to work for me - in that it's still functional, but the busylooping has disappeared and pppoe is now sleeping most of the time as it should.

Thanks!

I have another request, that feels too generic to create another issue, so I'm just mentioning it here - but I can open separate issues if you like it better. In order to help rp-pppoe build with musl (a libc that's, among other things, suited for static linking), it would be great if all Linux header inclusions could happen after the libc header inclusions. I.e. if linux/if.h is needed in a file, then it needs to be included last, even after local includes, because local includes may pull in libc headers; and the kernel only has workarounds for glibc, so other libcs are dependent on proper header inclusion order. (Yes, headers should be self-contained and commutative, but unfortunately Linux headers are not.)

And also, but that's a nitpick, it would be awesome to have a configure option that adds -static when linking all the binaries, but not rp-pppoe.so for obvious reasons.

Thanks a lot, rp-pppoe has served me well for quite a number of years and is still going strong. :-)

dfskoll commented 1 year ago

For the headers, do you mean that anything starting with linux/ should be included last? That's an easy change.

And adding a configure option for static linking also doesn't look too hard; I'll look into it.

Regards,

Dianne.

skarnet commented 1 year ago

Yes, that's what I mean. And it's a bit harder than it looks, because it means you can't include linux/ stuff in pppoe.h, for instance, unless you're sure that every file includes pppoe.h dead last (or with only Linux headers after it). But I'm sure you can manage it 😄

Thanks again!

dfskoll commented 1 year ago

OK, I think I have it fixed... can you try the latest master?

The #include file order should be fixed, and ./configure takes an --enable-static option to build static executables.

Regards,

Dianne.

skarnet commented 1 year ago

Almost perfect! Just two small things:

Thanks again for your awesome reactivity.

dfskoll commented 1 year ago

Hi, Laurent,

Should be fixed in latest master. (replace linux/if_ether.h with net/ethernet.h and add a rule for control_socket.o)

Regards,

Dianne.

skarnet commented 1 year ago

Typo: the rule should be for control_socket.o (underscore), not control-socket.o (hyphen). 😉

dfskoll commented 1 year ago

Hi, Laurent,

Typo: the rule should be for control_socket.o (underscore), not control-socket.o (hyphen). 😉

Ugh. There was also a typo in one of the dependencies. Should be fixed now.

Regards,

Dianne.

skarnet commented 1 year ago

Works like a charm now, thanks!

dfskoll commented 1 year ago

OK, great! I'll close this issue.