ansible / galaxy-dev

Ansible Automation Hub
11 stars 13 forks source link

Artifact Upload: missing filename erroneously reported as missing file data #245

Closed ironfroggy closed 4 years ago

ironfroggy commented 4 years ago

Bug Report

SUMMARY

HTTP multipart form-data file uploads define the filename as strictly optional but our API requires the filename be included with the uploaded file data. However, if an HTTP compliance file upload is submitted omitting the filename as part of the data, the resulting error is incorrect and does not make the correction clear.

STEPS TO REPRODUCE

Construct form-data compliant payload with the a tarball attachment, omitting the filename metadata, and upload it.

EXPECTED RESULTS

An error about the filename indicating the problem, as it cannot be matched with its expected namespace OR The namespace, name, and version to be used from the metadata and the filename ignored.

ACTUAL RESULTS

A 400 HTTP response with the error message "The submitted data was not a file."

alikins commented 4 years ago

Trying to reproduce this with httpie sending a plain text file

(.venv) [newswoop:F31:pulpcore (master % u=)]$ HTTP_PROXY=localhost:8088 http -f POST http://localhost:5001/api/automation-hub/v3/artifacts/collections/ < README.md 
HTTP/1.1 400 Bad Request
Allow: POST, OPTIONS
Content-Length: 136
Content-Type: application/json
Date: Wed, 12 Feb 2020 15:02:45 GMT
Server: WSGIServer/0.2 CPython/3.6.8
X-Frame-Options: SAMEORIGIN

{
    "errors": [
        {
            "code": "required",
            "detail": "No file was submitted.",
            "source": {
                "parameter": "file"
            },
            "status": "400",
            "title": "Invalid input."
        }
    ]
}

sending a tarfile:

(.venv) [newswoop:F31:pulpcore (master % u=)]$ HTTP_PROXY=localhost:8088 http --all --verbose -f POST http://localhost:5001/api/automation-hub/v3/artifacts/collections/ < stuff.tar.gz
POST /api/automation-hub/v3/artifacts/collections/ HTTP/1.1
Accept: */*
Accept-Encoding: gzip, deflate
Connection: keep-alive
Content-Length: 572
Content-Type: application/x-www-form-urlencoded; charset=utf-8
Host: localhost:5001
User-Agent: HTTPie/1.0.2

+-----------------------------------------+
| NOTE: binary data not shown in terminal |
+-----------------------------------------+

HTTP/1.1 400 Bad Request
Allow: POST, OPTIONS
Content-Length: 136
Content-Type: application/json
Date: Wed, 12 Feb 2020 15:06:31 GMT
Server: WSGIServer/0.2 CPython/3.6.8
X-Frame-Options: SAMEORIGIN

{
    "errors": [
        {
            "code": "required",
            "detail": "No file was submitted.",
            "source": {
                "parameter": "file"
            },
            "status": "400",
            "title": "Invalid input."
        }
    ]
}
alikins commented 4 years ago

@ironfroggy Is this more or less what you are seeing and how you are reproducing?

With auth, and using a tarfile, but using "file" multi-part-form variable:

http --all --verbose -f POST "http://localhost:5001/api/automation-hub/v3/artifacts/collections/" file@stuff.tar.gz

request

POST /api/automation-hub/v3/artifacts/collections/ HTTP/1.1
Accept: */*
Accept-Encoding: gzip, deflate
Connection: keep-alive
Content-Length: 767
Content-Type: multipart/form-data; boundary=2a656d7a787c44879c670b532d0a1c7d
Host: localhost:5001
User-Agent: HTTPie/1.0.2

With body like:

--2a656d7a787c44879c670b532d0a1c7d
Content-Disposition: form-data; name="file"; filename="stuff.tar.gz"
Content-Type: application/x-tar; charset=gzip
<binary stuff here>

Response

HTTP/1.1 500 Internal Server Error
Content-Length: 19508
Content-Type: text/plain; charset=utf-8
Date: Wed, 12 Feb 2020 15:28:18 GMT
Server: WSGIServer/0.2 CPython/3.6.8
X-Frame-Options: SAMEORIGIN

KeyError at /api/automation-hub/v3/artifacts/collections/
'namespace'

<lots of django debug stuff here>

Traceback from log

galaxy-api_1             | 2020-02-12 15:28:18,184 ERROR django.request bd03a28f-ce06-49bc-897b-a9b68b2e4460: Internal Server Error: /api/automation-hub/v3/artifacts/collections/
galaxy-api_1             | Traceback (most recent call last):
galaxy-api_1             |   File "/venv/lib64/python3.6/site-packages/django/core/handlers/exception.py", line 34, in inner
galaxy-api_1             |     response = get_response(request)
galaxy-api_1             |   File "/venv/lib64/python3.6/site-packages/django/core/handlers/base.py", line 115, in _get_response
galaxy-api_1             |     response = self.process_exception_by_middleware(e, request)
galaxy-api_1             |   File "/venv/lib64/python3.6/site-packages/django/core/handlers/base.py", line 113, in _get_response
galaxy-api_1             |     response = wrapped_callback(request, *callback_args, **callback_kwargs)
galaxy-api_1             |   File "/venv/lib64/python3.6/site-packages/django/views/decorators/csrf.py", line 54, in wrapped_view
galaxy-api_1             |     return view_func(*args, **kwargs)
galaxy-api_1             |   File "/venv/lib64/python3.6/site-packages/django/views/generic/base.py", line 71, in view
galaxy-api_1             |     return self.dispatch(request, *args, **kwargs)
galaxy-api_1             |   File "/venv/lib64/python3.6/site-packages/rest_framework/views.py", line 505, in dispatch
galaxy-api_1             |     response = self.handle_exception(exc)
galaxy-api_1             |   File "/venv/lib64/python3.6/site-packages/rest_framework/views.py", line 465, in handle_exception
galaxy-api_1             |     self.raise_uncaught_exception(exc)
galaxy-api_1             |   File "/venv/lib64/python3.6/site-packages/rest_framework/views.py", line 476, in raise_uncaught_exception
galaxy-api_1             |     raise exc
galaxy-api_1             |   File "/venv/lib64/python3.6/site-packages/rest_framework/views.py", line 502, in dispatch
galaxy-api_1             |     response = handler(request, *args, **kwargs)
galaxy-api_1             |   File "/code/galaxy-api/galaxy_api/api/v3/viewsets.py", line 157, in post
galaxy-api_1             |     serializer.is_valid(raise_exception=True)
galaxy-api_1             |   File "/venv/lib64/python3.6/site-packages/rest_framework/serializers.py", line 234, in is_valid
galaxy-api_1             |     self._validated_data = self.run_validation(self.initial_data)
galaxy-api_1             |   File "/venv/lib64/python3.6/site-packages/rest_framework/serializers.py", line 433, in run_validation
galaxy-api_1             |     value = self.to_internal_value(data)
galaxy-api_1             |   File "/code/galaxy-api/galaxy_api/api/v3/serializers.py", line 36, in to_internal_value
galaxy-api_1             |     filename_tuple = parse_collection_filename(filename)
galaxy-api_1             |   File "/code/galaxy-api/galaxy_api/api/utils.py", line 38, in parse_collection_filename
galaxy-api_1             |     raise ValueError(msg.format(filename=filename))
galaxy-api_1             | KeyError: 'namespace'
galaxy-api_1             | 2020-02-12 15:28:18,185 INFO galaxy_api.request bd03a28f-ce06-49bc-897b-a9b68b2e4460: POST /api/automation-hub/v3/artifacts/collections/ 500
galaxy-api_1             | 2020-02-12 15:28:18,186 ERROR django.server : "POST /api/automation-hub/v3/artifacts/collections/ HTTP/1.1" 500 19508
alikins commented 4 years ago

@ironfroggy

However, if an HTTP compliance file upload is submitted omitting the filename as part of the data, the resulting error is incorrect and does not make the correction clear.

For this endpoint (https://petstore.swagger.io/?url=https://raw.githubusercontent.com/ansible/galaxy-api/master/openapi/openapi.yaml#/Artifacts/uploadCollectionArtifact) the 'file' field is currently required and expected to be a binary string. But as you mention, it is not specified that it needs to use the filename="stuff.tar.gz" convention for the Content-Disposition for the file form field item.

The requirement for the file field to include 'filename' is baked into the django rest framework FileField type (https://github.com/encode/django-rest-framework/blob/master/rest_framework/fields.py#L1561-L1574). Changing it likely requires a new custom FileField subclass, plus the logic to extract the filename from the tar file. Not sure that is worth the effort at the moment.

OR The namespace, name, and version to be used from the metadata and the filename ignored.

I would say that "The namespace, name, and version to be used from the metadata and the filename ignored" would be a useful feature to add.

ironfroggy commented 4 years ago

@alikins If there are questions or problems with a bug raised, can we make sure there's a discussion on both sides before closing? On a related note, sorry for overlooking a response sooner.

I don't disagree that this issue is minor enough and the correction costly enough that it does not warrant our resources, given the DRF internal nature of the error handling involved. However, simply documenting the required parameter metadata (the filename in the Content-Disposition) would ensure that the issue is documented somewhere be it in an error message or in the API documentation.

Can we update the API spec just to clarify that one point? I would consider that a valid solution.

alikins commented 4 years ago

@ironfroggy Wondering if the bug as described was with a test based on modified ansible-galaxy requests? ie, I'm wondering if the multipart form encoding bug (https://github.com/ansible/galaxy-dev/issues/246) could be related.

When testing out how different tools intrepret the multipart form POST from ansible-galaxy publish, I noticed some tools end up with a valid 'files' attribute. Or with the content of the file squashed into the value of the 'sha256' field.

That said, I think Django/drf/galaxy always get the file info right for POSTs from ansible-galaxy, so probably unrelated but wanted to verify

chouseknecht commented 4 years ago

Assuming that since this is labeled wontfix that we either can't or don't need to fix.