aws / chalice

Python Serverless Microframework for AWS
Apache License 2.0
10.64k stars 1.01k forks source link

multipart/form-data support in AWS Chalice #796

Open pawans80 opened 6 years ago

pawans80 commented 6 years ago

Hello,

How would I parse the raw_body to get the filename and file content from "multipart/form-data" content-type request to PUT to S3?

When I tried the same with Flask, it has a request.files object property to extract the file content from the body:

@app.route('/upload', methods=['POST'])
def upload():
    try:
        print request.__dict__
        session = Session(aws_access_key_id='xxx',
                  aws_secret_access_key='xxxx',
                  region_name='xxx')
        s3 = session.resource('s3')
        s3.Bucket('my-bucket').put_object(Key='my-folder/mydata_file.pdf', Body=**request.files**['myfile'])
    except Exception as e:
        print str(e)

Does AWS Chalice supports multipart/form-data to get the file object like Flask (request.files)?

Or Is there another way to parse the raw_body?

Regards, Pawan

stealthycoin commented 6 years ago

Currently there is no built in support for parsing multipart data in chalice, though Python itself has a cgi module for it.

app.py

import cgi
from io import BytesIO

from chalice import Chalice

app = Chalice(app_name='test-upload')
app.debug = True

def _get_parts():
    rfile = BytesIO(app.current_request.raw_body)
    content_type = app.current_request.headers['content-type']
    _, parameters = cgi.parse_header(content_type)
    parameters['boundary'] = parameters['boundary'].encode('utf-8')
    parsed = cgi.parse_multipart(rfile, parameters)
    return parsed

@app.route('/upload', methods=['POST'],
           content_types=['multipart/form-data'])
def upload():
    files = _get_parts()
    print(files)
    return {k: v[0].decode('utf-8') for (k, v) in files.items()}
<form enctype="multipart/form-data" 
      method="POST" 
      action="https://<rest-id>.execute-api.us-west-2.amazonaws.com/api/upload">
    <input name="foo" type="text">
    <br>
    <input name="bar" type="file">
    <br>
    <button type="submit">
    Submit
    </button>
</form>

When I upload a file with 'bats' as its only content and I type "baz" in the text field I get this back:

{"foo": "baz", "bar": "bats"}

After going through this myself I can see the python 3 support in the cgi module leaves something to be desired (unless I missed something), initially I was thinking we didn't need it since its so easy to use the builtin module for this task. But I was convinced while implementing it that it should be a lot simpler, so I'll mark this as a feature request.

pawans80 commented 6 years ago

@stealthycoin - Thanks for your response! It would be great if we can add this feature in AWS chalice.

Regards, Pawan

aimanparvaiz commented 5 years ago

Any update on this? @stealthycoin

kislyuk commented 5 years ago

@stealthycoin I agree, it would be great to include a helper method to decode this type of input.

kislyuk commented 5 years ago

I did find a utility method to do this packaged by a well known third party library:

from requests_toolbelt import MultipartDecoder
decoder = MultipartDecoder(app.current_request.raw_body, app.current_request.headers['content-type'])
return {"parts": [p.content for p in decoder.parts]}
agstudy commented 5 years ago

@kislyuk @stealthycoin But how do you write the result here to a file? I tried something like :

  content = _get_parts()["myfile"]
    with NamedTemporaryFile("wb",suffix=".pdf",delete=False) as out:
        out.write(content[0])

but it does not work.

Zerthick commented 5 years ago

@stealthycoin When I run your _get_parts() function I get the following error:

Traceback (most recent call last):
  File "app.py", line 842, in _get_view_function_response
    response = view_function(**function_args)
  File "app.py", line 98, in session_upload_input
    parts = _get_parts()
  File "app.py", line 89, in _get_parts
    parsed = cgi.parse_multipart(rfile, parameters)
  File "/usr/lib/python3.7/cgi.py", line 220, in parse_multipart
    headers['Content-Length'] = pdict['CONTENT-LENGTH']
KeyError: 'CONTENT-LENGTH'
skwashd commented 5 years ago

@Zerthick you're hitting Python stdlib issue #34226 that was introduced in 3.7. The fix in python/cpython#8530 was submitted a year ago.

umairaslamm commented 4 years ago

@Zerthick you're hitting Python stdlib issue #34226 that was introduced in 3.7. The fix in python/cpython#8530 was submitted a year ago.

Looks like AWS still has somehow old cgi implementation for 3.7

timestamp,message
1595863600745,"START RequestId: 1deb06de-3965-4468-b28d-9a599c69785e Version: $LATEST
1595863600765,"post-upload - ERROR - Caught exception for <function upload at 0x7f646653c050>
1595863600765,"Traceback (most recent call last):
1595863600765,"File ""/var/task/chalice/app.py"", line 1111, in _get_view_function_response
1595863600765,"response = view_function(**function_args)
1595863600765,"File ""/var/task/app.py"", line 21, in upload
1595863600765,"files = _get_parts()
1595863600765,"File ""/var/task/app.py"", line 14, in _get_parts
1595863600765,"parsed = cgi.parse_multipart(rfile, parameters)
1595863600765,"File ""/var/lang/lib/python3.7/cgi.py"", line 220, in parse_multipart
1595863600765,"headers['Content-Length'] = pdict['CONTENT-LENGTH']
1595863600765,"KeyError: 'CONTENT-LENGTH'
1595863600766,"END RequestId: 1deb06de-3965-4468-b28d-9a599c69785e

or 3.8

timestamp,message
1595863334512,"START RequestId: c84173a1-efd6-4667-bb66-473f47cafb89 Version: $LATEST
1595863334558,"post-upload - ERROR - Caught exception for <function upload at 0x7f7e2bdb7b80>
1595863334558,"Traceback (most recent call last):
1595863334558,"File ""/var/task/chalice/app.py"", line 1111, in _get_view_function_response
1595863334558,"response = view_function(**function_args)
1595863334558,"File ""/var/task/app.py"", line 21, in upload
1595863334558,"files = _get_parts()
1595863334558,"File ""/var/task/app.py"", line 14, in _get_parts
1595863334558,"parsed = cgi.parse_multipart(rfile, parameters)
1595863334558,"File ""/var/lang/lib/python3.8/cgi.py"", line 203, in parse_multipart
1595863334558,"headers['Content-Length'] = pdict['CONTENT-LENGTH']
1595863334558,"KeyError: 'CONTENT-LENGTH'
1595863334559,"END RequestId: c84173a1-efd6-4667-bb66-473f47cafb89
umairaslamm commented 4 years ago

Same code works fine if lambda Runtime set to Python 3.6

brandold commented 4 years ago

Hey everyone,

I have an example implementation of supporting multipart/form-data for chunk based file uploads in Chalice. I figured I'd post it since my use case is very similar to what's described here.

I'm using the MultipartDecoder lib listed earlier to feed the form parts into a function I wrote to extract specific form values that I'm looking for (dropzone.js metadata + chunk bytes).

def parse_multipart_object(headers, content):
    for header in headers.split(';'):
        # Only get the specific dropzone form values we need
        if header == 'form-data':
            continue
        elif 'filename' in header:
            filename_object = {"filename": header.split('"')[1::2][0], "content": content}
            return filename_object
        elif 'name="file"' in header:
            continue
        else:
            header_name = header.split('"')[1::2][0]
            metadata_object = {header_name: content}
            return metadata_object

@app.route('/upload/{filesystem_id}', methods=["POST"], content_types=['multipart/form-data'], cors=True)
def upload(filesystem_id):
    if app.current_request.query_params['path']:
        path = app.current_request.query_params['path']
    else:
        app.log.error('Missing required query param: path')
        raise BadRequestError('Missing required query param: path')

    parsed_form_object = {}
    for part in MultipartDecoder(app.current_request.raw_body, app.current_request.headers['content-type']).parts:
        raw_name = str(part.headers[b'Content-Disposition'], 'utf-8')
        if "filename" in raw_name:
            b64_content = str(base64.b64encode(part.content), 'utf-8')
            parsed_object = parse_multipart_object(raw_name, b64_content)
        else:
            parsed_object = parse_multipart_object(raw_name, part.content.decode())

        if parsed_object is None:
            pass
        else:
            parsed_form_object.update(parsed_object)

This returns me a nicely formatted dict with all the dropzone values I need plus a base64 string of the bytes of the given chunk of file. I then pass that dict to a different lambda and write the file to disk there:

def upload(event):
    path = event['path']
    filename = event['form_data']['filename']
    file_content_decoded = base64.b64decode(event['form_data']['content'])
    current_chunk = int(event['form_data']['dzchunkindex'])
    save_path = os.path.join(path, filename)

    if os.path.exists(save_path) and current_chunk == 0:
        return {"message": "File already exists", "statusCode": 400}

    try:
        with open(save_path, 'ab') as f:
            f.seek(int(event['form_data']['dzchunkbyteoffset']))
            f.write(file_content_decoded)
    except OSError as error:
        print('Could not write to file: {error}'.format(error=error))
        return {"message": "couldn't write the file to disk", "statusCode": 500}

I one of the issues I ran into was that API Gateway was not treating multipart/form-data as binary, which caused the http body to be re-encoded as utf-8 and mess up the file. I had to explicitly tell API Gateway to treat multipart/form-data as binary.

It'd be really helpful to have native support for this in Chalice, similar to flask.

Hope this is helpful!

ricky-sb commented 3 years ago

I had to explicitly tell API Gateway to treat multipart/form-data as binary.

@brandold, how did you do this?

brandold commented 3 years ago

@ricky-sb Initially I manually went into the API Gateway settings and added that content type to the binary types. However, I recently discovered chalice will do this for you automatically if you append the binary content types in your app like so:

app.api.binary_types.append('multipart/form-data')

https://chalice.readthedocs.io/en/stable/api.html#APIGateway.binary_types

jrbeilke commented 3 years ago

Looks like there may be a potential duplicate of this issue with #1021 and also a potential PR with #1216

PR looks like it could use some work as there is some unresolved feedback and failing tests. Hopefully we can get an update or someone else can provide an improved PR in the meantime. If not I'll see what I can do over the next few days.

Also semi-related is #1574 for testing multipart requests with the new test client