aws / aws-graviton-getting-started

Helping developers to use AWS Graviton2, Graviton3, and Graviton4 processors which power the 6th, 7th, and 8th generation of Amazon EC2 instances (C6g[d], M6g[d], R6g[d], T4g, X2gd, C6gn, I4g, Im4gn, Is4gen, G5g, C7g[d][n], M7g[d], R7g[d], R8g).
https://aws.amazon.com/ec2/graviton/
Other
884 stars 199 forks source link

Lambda Graviton is slower than X86 for FFT #192

Closed entest-hai closed 2 years ago

entest-hai commented 2 years ago

Setting

  1. numpy==1.22.1
  2. function numpy.fft.fft()
  3. handler test running time of this
  4. lambda 10240MB RAM, 90 seconds timeout, python 3.8, same ecr based image """ sig = [np.random.randint(0, 1000, (4098, 600)) for k in range(4)] for x in sig: np.fft.fft(x, axis=0) """
  5. result
    • on average Lambda X86 takes 700ms, and Lambda Graviton takes 832ms
    • when using multi-thread, Lambda X86 takes 176ms, Lambda Graviton takes 210 ms

""" import json import numpy as np from concurrent.futures import ThreadPoolExecutor from datetime import datetime

def single_thread_fft(sig): """ normal fft """ start_time = datetime.now() for x in sig: np.fft.fft(x, axis=0) end_time = datetime.now() delta_time = end_time.timestamp() - start_time.timestamp() print("single thread running time {0} ms".format(delta_time * 1000)) return delta_time

def multi_thread_fft(sig): """ thread fft """ start_time = datetime.now() with ThreadPoolExecutor(max_workers=4) as executor: for x in sig: executor.submit(np.fft.fft, x, axis=0) end_time = datetime.now() delta_time = end_time.timestamp() - start_time.timestamp() print("multi thread running time {0} ms".format(delta_time * 1000)) return delta_time

def lambda_handler(event, context): """ Lambda handler """

signal for one channel

sig = [np.random.randint(0, 1000, (4098, 600)) for k in range(4)]
# single thread
single_thread_time = single_thread_fft(sig)
# multi thread
multi_thread_time = multi_thread_fft(sig)
# response
return {
    'statusCode': 200,
    'headers': {
        "Access-Control-Allow-Origin": "*",
        "Access-Control-Allow-Headers": "Content-Type",
        "Access-Control-Allow-Methods": "OPTIONS,GET"
    },
     'body': json.dumps({'single thread': "{0}, multi thread: {1}".format(single_thread_time * 1000, multi_thread_time*1000)},
                        indent=4,
                        sort_keys=True,
                        default=str)
}

"""

AWSNB commented 2 years ago

@entest-hai Acknowledge this, and thanks for details reproduction. our team looking into it

tbbharaj commented 2 years ago

@entest-hai Can you share your how you got numpy installed on your aws lambda function...Did you unzip the numpy .whl, zipped it and uploaded the zip to lambda?

entest-hai commented 2 years ago

@tbbharaj I used ecr image to deploy lambda functions. For the ARM, I built the ecr image from an ARM EC2 instance. Here is the Dockerfile

FROM public.ecr.aws/lambda/python:3.8

# create code dir inside container
RUN mkdir ${LAMBDA_TASK_ROOT}/source

# copy code to container
COPY . ${LAMBDA_TASK_ROOT}/source

# copy handler function to container
COPY ./handler.py ${LAMBDA_TASK_ROOT}

# install dependencies for running time environment
RUN pip3 install -r ./source/requirements.txt --target "${LAMBDA_TASK_ROOT}"

# set the CMD to your handler
CMD [ "handler.lambda_handler" ]

requirements.txt has only one line

numpy==1.22.1

Here is my repository for this experiment Graviton FFT

sebpop commented 2 years ago

The above single-thread program runs 2.2x faster on Graviton3 than on Graviton2:

c6g: single thread running time 946.2721347808838 ms
c7g: single thread running time 431.43606185913086 ms
c5:  single thread running time 713.0439281463623 ms

Graviton3 has a 2x wider FP unit than Graviton2.

entest-hai commented 2 years ago

@sebpop recently, as CodeBuild support ARM Graviton, I tested again by using the CodeBuild Graviton instance, and Graviton run faster about 34% than x86. So I would like to close this.