amazonlinux / amazon-linux-2023

Amazon Linux 2023
https://aws.amazon.com/linux/amazon-linux-2023/
Other
531 stars 39 forks source link

[Bug] - Website performance drops when moving to Amazon Linux 2023 #819

Open beeradmoore opened 1 day ago

beeradmoore commented 1 day ago

Describe the bug We have been looking at moving our AL2 servers over to AL2023 when one of our developers pointed out that our new servers (php8.3) was is taking 2-3x longer to serve requests.

I was looking into our server changes such as reduced memory config for php-fpm, and tweaked opcache settings, etc, but none of them appeared to be the problem. It looks like the underlying OpenSSL version that comes with AL2023 handles SSL handshakes very slowly.

Our problem is we use lots of different AWS services (DynamoDB for user sessions, Parameter Store for stored properties, RDS connections for database, etc). But past these initial requests we also do ajax requests from javascript after the first page has load. This compounds site performance considerably.

We have found related issues across various GitHub repos:

Below I show how I tested this on php8.2 on Amazon Linux 2 vs php8.2 on Amazon Linux 2023. We intended to use php8.3 on our new servers, but we can't use AL2023 with these performance issues so we are rolling back our work.

I also tested on Amazon Linux 2 (arm64) and Amazon Linux 2023 (arm64) but the performance was about the same as x86_64 so it was not included. I also tested on Ubuntu 24.04 and Debian 12 (both x86_64), listed at the end of this issue.

While both servers are running php8.2 they are using different OpenSSL versions as reported by phpinfo. Amazon Linux 2 - OpenSSL 1.0.2k-fips 26 Jan 2017 Amazon Linux 2023 - OpenSSL 3.0.8 7 Feb 2023

To Reproduce Below is commands how I setup the 2 servers. After these I will show test.php which is what I use to access DynamoDB. Keeping in mind this is a very small test, a larger system suffers more from these slowdowns.

Launch Amazon Linux 2 server and then

sudo su
yum update -y
yum upgrade -y
amazon-linux-extras install -y php8.2
yum install -y httpd php-xml php-opcache
systemctl start httpd
systemctl enable httpd
systemctl restart php-fpm
usermod -a -G apache ec2-user
chown -R ec2-user:apache /var/www
chmod 2775 /var/www && find /var/www -type d -exec chmod 2775 {} \;
find /var/www -type f -exec chmod 0664 {} \;

#Install composer
curl -sS https://getcomposer.org/installer | php -- --version=2.8.1 --install-dir=/usr/bin/ --filename=composer

# become ec2-user
exit;

# Install AWS SDK
cd /var/www/html/
echo "<?php phpinfo(); ?>" > /var/www/html/phpinfo.php
composer require aws/aws-sdk-php

For the Amazon Linux 2023 server,

sudo su
dnf upgrade -y
dnf install -y php8.2
dnf install -y httpd php-zip php-opcache
systemctl start httpd
systemctl enable httpd
systemctl restart php-fpm
usermod -a -G apache ec2-user
chown -R ec2-user:apache /var/www
chmod 2775 /var/www && find /var/www -type d -exec chmod 2775 {} \;
find /var/www -type f -exec chmod 0664 {} \;

#Install composer
curl -sS https://getcomposer.org/installer | php -- --version=2.8.1 --install-dir=/usr/bin/ --filename=composer

# become ec2-user
exit;

# Install AWS SDK
cd /var/www/html/
echo "<?php phpinfo(); ?>" > /var/www/html/phpinfo.php
composer require aws/aws-sdk-php

This is the test.php script I am running on these 2 severs. This test was also run on arm64 variants of these 2 servers, as well as Ubuntu 24.04 (x86_64) and Debian 12 (x86_64) (listed below).

<?php

$last_time = microtime(true) * 1000.0;
$start_time = $last_time;

function chart_time(string $label)
{
    global $last_time;
    $current_time = microtime(true) * 1000.0;
    $time_diff = round( ($current_time - $last_time) , 2);
    echo "$label - {$time_diff}ms <br>\n";
    $last_time = $current_time;

}

require_once 'vendor/autoload.php';
chart_time('autoload');

$dynamodb_client = new \Aws\DynamoDb\DynamoDbClient([
    'region'   => 'ap-southeast-2',
    'version'  => '2012-08-10',
]);

chart_time('dynamodb __construct');

$result = $dynamodb_client->query([
    'TableName' => 'internal_app_builds',
    'KeyConditionExpression' => 'build_id = :build_id',
        'ExpressionAttributeValues' => [
            ':build_id' => ['N' => '1933']
    ],
]);

chart_time('query');

$result = $dynamodb_client->query([
        'TableName' => 'internal_app_builds',
        'KeyConditionExpression' => 'build_id = :build_id',
        'ExpressionAttributeValues' => [
            ':build_id' => ['N' => '1944']
        ],
]);

chart_time('query a second time');

$total_time = round( (microtime(true) * 1000.0) - $start_time , 2);
echo "Total time: {$total_time}ms <br>\n";

The test script uses its chart_time function to log how many ms have passed since the last time the chart_time was called. This allows us to measure each line severalty without having to use xdebug to dig deeper.

Test results Tests were not recorded as correct benchmarks to get averages, as such the averages are eyeballed.

Amazon Linux 2

autoload - 0.19ms
dynamodb __construct - 2.6ms
query - 32.08ms
query a second time - 3.39ms
Total time: 38.27ms

Amazon Linux 2023

autoload - 0.2ms
dynamodb __construct - 2.51ms
query - 78.73ms
query a second time - 2.69ms
Total time: 84.14ms

Results of running our query are slow. I have seen query jump as high as 150ms at times. The second query is fetching a different ID to try avoid any internal caches. It is amazingly fast after first load.

Surprisingly Ubuntu and Debian had this second query performing as slow as the first query.

Expected behavior With all of the talk of AL2023 being optimised, using an updated kernel, updating php from 8.0 to 8.3 we expected performance improvements. Initial benchmarks appeared to show this, but it appears I failed to benchmark correctly. Instead of testing against production code of old server to new server I instead run generic php benchmarks to show how internal methods were faster. None of these tests hit a https:// request which would have exposed the issues we are having now. They also didn't take ajax requests into account.

Screenshots N/A

Desktop (please complete the following information): N/A

Smartphone (please complete the following information): N/A

Additional context

All instances were t3.micro or t4g.micro instances.

This is the rough setup I did for an Ubuntu 24.04 server. php8.2 was not available. OpenSSL library version was OpenSSL 3.0.13 30 Jan 2024.

sudo su
apt update
apt upgrade -y
apt install -y php8.3
apt install -y apache2 php-xml php-opcache php-zip 7zip unzip
chown -R ubuntu:ubuntu /var/www
chmod 2775 /var/www && find /var/www -type d -exec chmod 2775 {} \;
find /var/www -type f -exec chmod 0664 {} \;

#Install composer
curl -sS https://getcomposer.org/installer | php -- --version=2.8.1 --install-dir=/usr/bin/ --filename=composer

# become ubuntu user
exit;

# Install AWS SDK
cd /var/www/html/
echo "<?php phpinfo(); ?>" > /var/www/html/phpinfo.php
composer require aws/aws-sdk-php

This is the Debian 12 setup. It is using OpenSSL 3.0.14 4 Jun 2024.

sudo su
apt update
apt upgrade -y
apt install -y php8.2
apt install -y apache2 php-xml php-opcache php-zip 7zip unzip

chown -R admin:admin /var/www
chmod 2775 /var/www && find /var/www -type d -exec chmod 2775 {} \;
find /var/www -type f -exec chmod 0664 {} \;

#Install composer
curl -sS https://getcomposer.org/installer | php -- --version=2.8.1 --install-dir=/usr/bin/ --filename=composer

# become admin user
exit;

# Install AWS SDK
cd /var/www/html/
echo "<?php phpinfo(); ?>" > /var/www/html/phpinfo.php
composer require aws/aws-sdk-php

The results of the above 2 tests were a shock to us,

Ubuntu:

autoload - 0.08ms
dynamodb __construct - 1.5ms
query - 58.39ms
query a second time - 64.07ms
Total time: 124.05ms

Debian:

autoload - 0.14ms
dynamodb __construct - 3.69ms
query - 70.66ms
query a second time - 80.96ms
Total time: 155.46ms

I do not understand why the second query is around the same speed as the first, whereas in both Amazon Linux 2 and Amazon Linux 2023 it is considerably faster.

ozbenh commented 21 hours ago

It is possibly due to regressions in OpenSSL since 3.0 which we are also chasing. We have been investigating the possibility of backporting some fixes that went into newer 3.x but this comes with some risk and could cause problems with our ongoing FIPS certification. That said, we are aware and trying to find a solution.

beeradmoore commented 21 hours ago

Yeah, I agree, I think that's what it is. I have tried to compile OpenSSL 3.3.2 to our AL2023 server to do the above tests again. I can get it built, but can't seem to install it correctly.

That would be more of a test for us rather than a long term solution. I'd rather not be in charge of OpenSSL security across our fleet of instances, but would prefer to leave that up to people in AWS who know better.