catchpoint / WebPageTest.agent

Cross-platform WebPageTest agent
Other
213 stars 138 forks source link

Explore migration to Ubuntu 22.04 #585

Closed pmeenan closed 1 year ago

pmeenan commented 1 year ago

Not technically an agent issue since the agent itself likely supports 22.04 but most production installs are currently using Ubuntu 18.04 as a base OS. 18.04 reaches end-of-life in April 2023 so plans should be made to migrate before then.

Last time I looked, 20.04 ran tests slower and impacted the results. It looked like it was related to the virtual screen buffer and graphics but I didn't spend that much time investigating.

Presumably we'll want to skip straight to 22.04 so we're good for another 4 years before having to switch OS's again.

jefflembeck commented 1 year ago

@pmeenan I haven't done any checks on it. Have you seen if 22.04 has the same issue with tests as 20.04?

pmeenan commented 1 year ago

I haven't looked yet (heck, I haven't even tried to run the install script on 22.04 yet).

pmeenan commented 1 year ago

Well, on the good news side, everything appears to "work". On the bad news side, traffic-shaping seems to be behaving a little wonky or something on the system is doing a lot of background network activity: https://wpt.meenan.us/result/221205_ZiZ4_7/1/details/#waterfall_view_step1

Here is a reference run also at 3G Fast: https://www.webpagetest.org/result/221205_BiDc9T_B6V/1/details/#waterfall_view_step1

The bandwidth line spiked means the interface was seeing more traffic that WPT is measuring through Chrome (unless the clock is also skewed) so hopefully it's just a matter of finding what services are doing background stuff and removing them.

pmeenan commented 1 year ago

Doesn't look like it is background activity. Maybe something going on with the buffering of packets with tc? Still digging in but the raw tcpdump is showing about 5MB in data from the webpagetest origin but the raw TCP stream itself only accounts for ~800k so either there were a LOT of retransmits or something else is going on.

pmeenan commented 1 year ago

99% sure the issue is with the traffic-shaping. Faster speeds don't bloat the aggregate TCP size. Going to install a gui version so I can experimenting manually with netem to see what is going on and if we need to hold it differently (or use something else for the rate limiting and just use netem for latency).

pmeenan commented 1 year ago

More of a note to myself, but check the firefox install on a headless 22.04. Desktop uses snap which doesn't work well for automation through geckodriver but it can be installed from apt with a few config changes. Since headless doesn't support snap, need to see how firefox installs and if the work-arounds are necessary.

pmeenan commented 1 year ago

Argh. That's 2 days I'll never get back (but at least good news). I was testing in vmware and it looks like using an ethernet interface that "shares" with the OS doesn't work right. Switching it to "bridged" fixed it: https://wpt.meenan.us/result/221206_ZiXC_8/

Just need to poke at the Firefox install part of the install script but everything else looks ok.

tkadlec commented 1 year ago

Nice! Yeah...that's looking pretty solid

pmeenan commented 1 year ago

Firefox install is cleaned up and working now so it should be good to go. I'm moving the HTTP Archive over to 22.04 for the next crawl.

tkadlec commented 1 year ago

Than so much for diving into this, @pmeenan!