awslabs / s3-connector-for-pytorch

The Amazon S3 Connector for PyTorch delivers high throughput for PyTorch training jobs that access and store data in Amazon S3.
BSD 3-Clause "New" or "Revised" License
120 stars 18 forks source link

Question about Mountpoint Client Performance #236

Closed ryxli closed 1 month ago

ryxli commented 1 month ago

s3torchconnector version

latest

s3torchconnectorclient version

latest

AWS Region

us-east-1, ap-south-1

Describe the running environment

EC2 instance

What happened?

Getting some significant performance difference between regular boto3 download_obj which uses s3 crt transfer config and mountpoint client, even with various settings for throughput and part size. boto3 client just use the default settings

To reproduce try downloading a 2GB file from s3 with mountpoint client (S3Reader) vs regular boto3 client.

# s3 torch connector snippet
s3_reader = S3Reader(
    bucket,
    key,
    get_object_info=get_object_info,
    get_stream=partial(self._get_object_stream, bucket, key),
)
start = time.time()
s3_reader.read()
print(f"mountpoint finish in {time.time() - start}")

# boto3 snippet
client = boto3.client('s3')
start = time.time()
client.download_file(bucket, key, test_path)
print(f"boto3 finish in {time.time() - start}")

Results:

boto3 finish in 4.756349802017212

THROUGHPUT_GBPS=400 PART_SIZE=8MB mountpoint finish in 11.611842155456543
THROUGHPUT_GBPS=400 PART_SIZE=16MB mountpoint finish in 11.204271793365479
THROUGHPUT_GBPS=400 PART_SIZE=32MB mountpoint finish in 14.50656008720398
THROUGHPUT_GBPS=400 PART_SIZE=64MB mountpoint finish in 14.8595449924469
THROUGHPUT_GBPS=400 PART_SIZE=128MB mountpoint finish in 16.200087547302246
THROUGHPUT_GBPS=200 PART_SIZE=8MB mountpoint finish in 11.326915740966797
THROUGHPUT_GBPS=200 PART_SIZE=16MB mountpoint finish in 11.493544578552246
THROUGHPUT_GBPS=200 PART_SIZE=32MB mountpoint finish in 14.47541093826294
THROUGHPUT_GBPS=200 PART_SIZE=64MB mountpoint finish in 14.830240249633789
THROUGHPUT_GBPS=200 PART_SIZE=128MB mountpoint finish in 16.252891063690186
THROUGHPUT_GBPS=100 PART_SIZE=8MB mountpoint finish in 11.260882139205933
THROUGHPUT_GBPS=100 PART_SIZE=16MB mountpoint finish in 11.112053871154785
THROUGHPUT_GBPS=100 PART_SIZE=32MB mountpoint finish in 14.596931219100952
THROUGHPUT_GBPS=100 PART_SIZE=64MB mountpoint finish in 14.992492437362671
THROUGHPUT_GBPS=100 PART_SIZE=128MB mountpoint finish in 16.195728063583374
THROUGHPUT_GBPS=50 PART_SIZE=8MB mountpoint finish in 10.574751377105713
THROUGHPUT_GBPS=50 PART_SIZE=16MB mountpoint finish in 10.638182163238525
THROUGHPUT_GBPS=50 PART_SIZE=32MB mountpoint finish in 14.243337154388428
THROUGHPUT_GBPS=50 PART_SIZE=64MB mountpoint finish in 14.779768705368042
THROUGHPUT_GBPS=50 PART_SIZE=128MB mountpoint finish in 16.239811897277832

However, this performance gap seems to disappear in multiprocess setting, but again without any tuning on the transferconfig for boto3

Relevant log output

No response

Code of Conduct

matthieu-d4r commented 1 month ago

Hello @ryxli, thank you for opening up that issue.

I can't seem to reproduce the performance downgrade you're observing: would you mind sharing more details? Namely, your instance type and Python + boto3 + s3torchconnector versions?

Here's what I tried:

  1. Create an EC2 instance (AMI: Deep Learning OSS Nvidia Driver AMI GPU PyTorch 2.3.0 (Amazon Linux 2) 20240825, instance type: g4dn.2xlarge) in ap-south-1
  2. Create an S3 bucket in ap-south-1, and upload a 2 Gb file in it
  3. ssh into the EC2 instance
  4. Create a Python venv, and run pip install s3torchconnector s3torchconnectorclient boto3 numpy
  5. Execute the following script:
import time

import boto3
from s3torchconnector._s3client import S3Client

def issue236():
    bucket = "my_bucket"
    key = "large_2gb"

    # s3 torch connector snippet
    s3_client = S3Client("ap-south-1")
    tic = time.perf_counter()
    s3_client.get_object(bucket, key).read()
    toc = time.perf_counter()
    print(f"mountpoint finishes in {toc - tic:0.4f} seconds")

    # boto3 snippet
    client = boto3.client('s3')
    tic = time.perf_counter()
    client.download_file(bucket, key, 'my_large_2gb')
    toc = time.perf_counter()
    print(f"boto3 finishes in {toc - tic:0.4f} seconds")

if __name__ == "__main__":
    issue236()

Overall, the PyTorch connector runs consistently faster than boto3 (example run):

mountpoint finishes in 4.7542 seconds
boto3 finishes in 5.7658 seconds

Finally, here are the versions used for this test:

s3torchconnector         1.2.5
s3torchconnectorclient   1.2.5
torch                    2.4.1
boto3                    1.35.24
ryxli commented 1 month ago

@matthieu-d4r

Am still able to reproduce this issue with your code snippet, this time with a 6GB object.

S3 Bucket region: us-east-1 Ec2 region: ap-south-1

import time  

import boto3  
from s3torchconnector._s3client import S3Client  

def issue236():  
    # s3 torch connector snippet  
    s3_client = S3Client("us-east-1")  
    tic = time.perf_counter()  
    s3_client.get_object(bucket, key).read()  
    toc = time.perf_counter()  
    print(f"mountpoint finishes in {toc - tic:0.4f} seconds")  

    # boto3 snippet  
    client = boto3.client('s3')  
    tic = time.perf_counter()  
    client.download_file(bucket, key, 'my_large_6gb')  
    toc = time.perf_counter()  
    print(f"boto3 finishes in {toc - tic:0.4f} seconds")  

issue236()

Output:

mountpoint finishes in 27.9438 seconds
boto3 finishes in 19.0174 seconds

Versions:

import s3torchconnector
import s3torchconnectorclient
import torch
import boto3

print(s3torchconnector.__version__)
print(s3torchconnectorclient.__version__)
print(torch.__version__)
print(boto3.__version__)
1.2.5
1.2.5
2.3.0a0+6ddf5cf85e.nv24.04
1.35.20

boto3 is installed with pip install 'boto3[crt]'

matthieu-d4r commented 1 month ago

Hi @ryxli,

I ran the snippet again too, against a bucket in a different region (same setup as you: EC2 instance in ap-south-1 and S3 bucket in us-east-1), and still no performance degradation; I also installed Boto3 with pip install boto3[crt].

One question though: I noticed in your PyTorch version an unusual number (2.3.0a0+6ddf5cf85e.nv24.04), which I found referenced in https://docs.nvidia.com/deeplearning/frameworks/pytorch-release-notes/rel-24-04.html: are you running this snippet from within a container? Or are you executing it on a "raw" EC2 instance (i.e., sshed in and nothing else)?

ryxli commented 1 month ago

I am running this snippet from within a container on the ec2 instance, also from a Jupyter notebook

matthieu-d4r commented 1 month ago

Hi @ryxli,

As discussed offline with you, we'll proceed to close this issue for now, as we were unable to reproduce the problem.