Open pawans80 opened 6 years ago
Currently there is no built in support for parsing multipart data in chalice, though Python itself has a cgi module for it.
app.py
import cgi
from io import BytesIO
from chalice import Chalice
app = Chalice(app_name='test-upload')
app.debug = True
def _get_parts():
rfile = BytesIO(app.current_request.raw_body)
content_type = app.current_request.headers['content-type']
_, parameters = cgi.parse_header(content_type)
parameters['boundary'] = parameters['boundary'].encode('utf-8')
parsed = cgi.parse_multipart(rfile, parameters)
return parsed
@app.route('/upload', methods=['POST'],
content_types=['multipart/form-data'])
def upload():
files = _get_parts()
print(files)
return {k: v[0].decode('utf-8') for (k, v) in files.items()}
<form enctype="multipart/form-data"
method="POST"
action="https://<rest-id>.execute-api.us-west-2.amazonaws.com/api/upload">
<input name="foo" type="text">
<br>
<input name="bar" type="file">
<br>
<button type="submit">
Submit
</button>
</form>
When I upload a file with 'bats' as its only content and I type "baz" in the text field I get this back:
{"foo": "baz", "bar": "bats"}
After going through this myself I can see the python 3 support in the cgi module leaves something to be desired (unless I missed something), initially I was thinking we didn't need it since its so easy to use the builtin module for this task. But I was convinced while implementing it that it should be a lot simpler, so I'll mark this as a feature request.
@stealthycoin - Thanks for your response! It would be great if we can add this feature in AWS chalice.
Regards, Pawan
Any update on this? @stealthycoin
@stealthycoin I agree, it would be great to include a helper method to decode this type of input.
I did find a utility method to do this packaged by a well known third party library:
from requests_toolbelt import MultipartDecoder
decoder = MultipartDecoder(app.current_request.raw_body, app.current_request.headers['content-type'])
return {"parts": [p.content for p in decoder.parts]}
@kislyuk @stealthycoin But how do you write the result here to a file? I tried something like :
content = _get_parts()["myfile"]
with NamedTemporaryFile("wb",suffix=".pdf",delete=False) as out:
out.write(content[0])
but it does not work.
@stealthycoin When I run your _get_parts()
function I get the following error:
Traceback (most recent call last):
File "app.py", line 842, in _get_view_function_response
response = view_function(**function_args)
File "app.py", line 98, in session_upload_input
parts = _get_parts()
File "app.py", line 89, in _get_parts
parsed = cgi.parse_multipart(rfile, parameters)
File "/usr/lib/python3.7/cgi.py", line 220, in parse_multipart
headers['Content-Length'] = pdict['CONTENT-LENGTH']
KeyError: 'CONTENT-LENGTH'
@Zerthick you're hitting Python stdlib issue #34226 that was introduced in 3.7. The fix in python/cpython#8530 was submitted a year ago.
@Zerthick you're hitting Python stdlib issue #34226 that was introduced in 3.7. The fix in python/cpython#8530 was submitted a year ago.
Looks like AWS still has somehow old cgi implementation for 3.7
timestamp,message
1595863600745,"START RequestId: 1deb06de-3965-4468-b28d-9a599c69785e Version: $LATEST
1595863600765,"post-upload - ERROR - Caught exception for <function upload at 0x7f646653c050>
1595863600765,"Traceback (most recent call last):
1595863600765,"File ""/var/task/chalice/app.py"", line 1111, in _get_view_function_response
1595863600765,"response = view_function(**function_args)
1595863600765,"File ""/var/task/app.py"", line 21, in upload
1595863600765,"files = _get_parts()
1595863600765,"File ""/var/task/app.py"", line 14, in _get_parts
1595863600765,"parsed = cgi.parse_multipart(rfile, parameters)
1595863600765,"File ""/var/lang/lib/python3.7/cgi.py"", line 220, in parse_multipart
1595863600765,"headers['Content-Length'] = pdict['CONTENT-LENGTH']
1595863600765,"KeyError: 'CONTENT-LENGTH'
1595863600766,"END RequestId: 1deb06de-3965-4468-b28d-9a599c69785e
or 3.8
timestamp,message
1595863334512,"START RequestId: c84173a1-efd6-4667-bb66-473f47cafb89 Version: $LATEST
1595863334558,"post-upload - ERROR - Caught exception for <function upload at 0x7f7e2bdb7b80>
1595863334558,"Traceback (most recent call last):
1595863334558,"File ""/var/task/chalice/app.py"", line 1111, in _get_view_function_response
1595863334558,"response = view_function(**function_args)
1595863334558,"File ""/var/task/app.py"", line 21, in upload
1595863334558,"files = _get_parts()
1595863334558,"File ""/var/task/app.py"", line 14, in _get_parts
1595863334558,"parsed = cgi.parse_multipart(rfile, parameters)
1595863334558,"File ""/var/lang/lib/python3.8/cgi.py"", line 203, in parse_multipart
1595863334558,"headers['Content-Length'] = pdict['CONTENT-LENGTH']
1595863334558,"KeyError: 'CONTENT-LENGTH'
1595863334559,"END RequestId: c84173a1-efd6-4667-bb66-473f47cafb89
Same code works fine if lambda Runtime set to Python 3.6
Hey everyone,
I have an example implementation of supporting multipart/form-data for chunk based file uploads in Chalice. I figured I'd post it since my use case is very similar to what's described here.
I'm using the MultipartDecoder lib listed earlier to feed the form parts into a function I wrote to extract specific form values that I'm looking for (dropzone.js metadata + chunk bytes).
def parse_multipart_object(headers, content):
for header in headers.split(';'):
# Only get the specific dropzone form values we need
if header == 'form-data':
continue
elif 'filename' in header:
filename_object = {"filename": header.split('"')[1::2][0], "content": content}
return filename_object
elif 'name="file"' in header:
continue
else:
header_name = header.split('"')[1::2][0]
metadata_object = {header_name: content}
return metadata_object
@app.route('/upload/{filesystem_id}', methods=["POST"], content_types=['multipart/form-data'], cors=True)
def upload(filesystem_id):
if app.current_request.query_params['path']:
path = app.current_request.query_params['path']
else:
app.log.error('Missing required query param: path')
raise BadRequestError('Missing required query param: path')
parsed_form_object = {}
for part in MultipartDecoder(app.current_request.raw_body, app.current_request.headers['content-type']).parts:
raw_name = str(part.headers[b'Content-Disposition'], 'utf-8')
if "filename" in raw_name:
b64_content = str(base64.b64encode(part.content), 'utf-8')
parsed_object = parse_multipart_object(raw_name, b64_content)
else:
parsed_object = parse_multipart_object(raw_name, part.content.decode())
if parsed_object is None:
pass
else:
parsed_form_object.update(parsed_object)
This returns me a nicely formatted dict with all the dropzone values I need plus a base64 string of the bytes of the given chunk of file. I then pass that dict to a different lambda and write the file to disk there:
def upload(event):
path = event['path']
filename = event['form_data']['filename']
file_content_decoded = base64.b64decode(event['form_data']['content'])
current_chunk = int(event['form_data']['dzchunkindex'])
save_path = os.path.join(path, filename)
if os.path.exists(save_path) and current_chunk == 0:
return {"message": "File already exists", "statusCode": 400}
try:
with open(save_path, 'ab') as f:
f.seek(int(event['form_data']['dzchunkbyteoffset']))
f.write(file_content_decoded)
except OSError as error:
print('Could not write to file: {error}'.format(error=error))
return {"message": "couldn't write the file to disk", "statusCode": 500}
I one of the issues I ran into was that API Gateway was not treating multipart/form-data as binary, which caused the http body to be re-encoded as utf-8 and mess up the file. I had to explicitly tell API Gateway to treat multipart/form-data as binary.
It'd be really helpful to have native support for this in Chalice, similar to flask.
Hope this is helpful!
I had to explicitly tell API Gateway to treat multipart/form-data as binary.
@brandold, how did you do this?
@ricky-sb Initially I manually went into the API Gateway settings and added that content type to the binary types. However, I recently discovered chalice will do this for you automatically if you append the binary content types in your app like so:
app.api.binary_types.append('multipart/form-data')
https://chalice.readthedocs.io/en/stable/api.html#APIGateway.binary_types
Looks like there may be a potential duplicate of this issue with #1021 and also a potential PR with #1216
PR looks like it could use some work as there is some unresolved feedback and failing tests. Hopefully we can get an update or someone else can provide an improved PR in the meantime. If not I'll see what I can do over the next few days.
Also semi-related is #1574 for testing multipart requests with the new test client
Hello,
How would I parse the raw_body to get the filename and file content from "multipart/form-data" content-type request to PUT to S3?
When I tried the same with Flask, it has a request.files object property to extract the file content from the body:
Does AWS Chalice supports multipart/form-data to get the file object like Flask (request.files)?
Or Is there another way to parse the raw_body?
Regards, Pawan