ansible / galaxy-dev

Ansible Automation Hub
11 stars 13 forks source link

Artifact Upload: Hash parameter appears to be ignored #246

Closed ironfroggy closed 4 years ago

ironfroggy commented 4 years ago

Bug Report

SUMMARY

The hash parameter, a SHA256 hash of the uploaded artifact file, appears to be completely ignored by the endpoint. Omitting the hash or submitting the file with an incorrect hash results in no error or warning.

STEPS TO REPRODUCE

Submit an artifact to the endpoint but replace the generated hash with an incorrect one or submit with the hash parameter omitted entirely from the form-data payload.

EXPECTED RESULTS

If the hash does not match, the file should be rejected. It should cause a 400 response and an error message in the APIs standard format with a message about the hash requirements.

ACTUAL RESULTS

The tarball is successfully submitted without a matching hash.

alikins commented 4 years ago

This is the sha256 that is a field on the multipart-form post to ​/artifacts​/collections​/ https://petstore.swagger.io/?url=https://raw.githubusercontent.com/ansible/galaxy-api/master/openapi/openapi.yaml#/Artifacts/uploadCollectionArtifact ?

The galaxy_api endpoint (galaxy_api.api.v3.collections.CollectionArtifactViewSet.upload) doesn't enforce the existence of a sha256 (since the ui doesn't have it). If it is provided, it isn't checked. galaxy_api could verify and the pulp_ansible side should likely verify as well, since it doesn't seem to be at the moment.

alikins commented 4 years ago

@ironfroggy

Testing against local galaxy-dev, behavior seems to be what I expect.

With no sha256: the sha256 sum is ignored and isn't checked

With bogus sha256sum: the import fails with a 400 and "The provided sha256 value does not match the sha256 of the uploaded file."

With correct sha256sum:

incorrect sha256 example

http --all --verbose -f POST http://localhost:5001/api/automation-hub/v3/artifacts/collections/ file@/home/adrian/src/testcollections/collection-releases/alikins-collection_inspect-0.0.123.tar.gz sha256=aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
POST /api/automation-hub/v3/artifacts/collections/ HTTP/1.1
Accept: */*
Accept-Encoding: gzip, deflate
Connection: keep-alive
Content-Length: 13832
Content-Type: multipart/form-data; boundary=59b105d4e1be4dd8b0cf2b98f2a1bf46
Host: localhost:5001
User-Agent: HTTPie/1.0.2

+-----------------------------------------+
| NOTE: binary data not shown in terminal |
+-----------------------------------------+

HTTP/1.1 400 Bad Request
Allow: POST, OPTIONS
Content-Length: 156
Content-Type: application/json
Date: Wed, 12 Feb 2020 19:10:49 GMT
Server: WSGIServer/0.2 CPython/3.6.8
X-Frame-Options: SAMEORIGIN

{
    "errors": [
        {
            "code": "invalid",
            "detail": "The provided sha256 value does not match the sha256 of the uploaded file.",
            "status": "400",
            "title": "Invalid input."
        }
    ]
}

With no sha256sum:


(.venv) [newswoop:F31:pulpcore (master % u=)]$ HTTP_PROXY=localhost:8088 http --all --verbose -f POST http://localhost:5001/api/automation-hub/v3/artifacts/collections/ file@/home/adrian/src/testcollections/collection-releases/alikins-collection_inspect-0.0.123.tar.gz
POST /api/automation-hub/v3/artifacts/collections/ HTTP/1.1
Accept: */*
Accept-Encoding: gzip, deflate
Connection: keep-alive
Content-Length: 13684
Content-Type: multipart/form-data; boundary=aadbd06e737e4fc7a53998a3767339dc
Host: localhost:5001
User-Agent: HTTPie/1.0.2

+-----------------------------------------+
| NOTE: binary data not shown in terminal |
+-----------------------------------------+

HTTP/1.1 202 Accepted
Allow: POST, OPTIONS
Content-Length: 91
Content-Type: application/json
Date: Wed, 12 Feb 2020 19:12:17 GMT
Server: WSGIServer/0.2 CPython/3.6.8
X-Frame-Options: SAMEORIGIN

{
    "task": "/api/automation-hub/v3/imports/collections/c070f45c-bad0-45f1-9c51-14bb5553a590/"
}

Correct sha256 checksum:

(.venv) [newswoop:F31:pulpcore (master % u=)]$ sha256sum /home/adrian/src/testcollections/collection-releases/alikins-collection_inspect-0.0.124.tar.gz
bae607c68fef0bb25e4c81b6bae4002d0f61daaf9568eb442d504a9248c24348  /home/adrian/src/testcollections/collection-releases/alikins-collection_inspect-0.0.124.tar.gz

(.venv) [newswoop:F31:pulpcore (master % u=)]$ HTTP_PROXY=localhost:8088 http --all --verbose -f POST http://localhost:5001/api/automation-hub/v3/artifacts/collections/ file@/home/adrian/src/testcollections/collection-releases/alikins-collection_inspect-0.0.124.tar.gz sha256=bae607c68fef0bb25e4c81b6bae4002d0f61daaf9568eb442d504a9248c24348
POST /api/automation-hub/v3/artifacts/collections/ HTTP/1.1
Accept: */*
Accept-Encoding: gzip, deflate
Connection: keep-alive
Content-Length: 13835
Content-Type: multipart/form-data; boundary=4df9344dcfc04bba8fb922e0393b7018
Host: localhost:5001
User-Agent: HTTPie/1.0.2

+-----------------------------------------+
| NOTE: binary data not shown in terminal |
+-----------------------------------------+

HTTP/1.1 202 Accepted
Allow: POST, OPTIONS
Content-Length: 91
Content-Type: application/json
Date: Wed, 12 Feb 2020 19:15:54 GMT
Server: WSGIServer/0.2 CPython/3.6.8
X-Frame-Options: SAMEORIGIN

{
    "task": "/api/automation-hub/v3/imports/collections/f0910741-e359-4182-b580-1935fd9b648c/"
}
ironfroggy commented 4 years ago

I have re-run my own tests on the latest deploy to CI, and these conditions still fail... I notice on the pulp side that the hash is, in fact, optional. this is looking at the Artifact.init_and_validate method.

Here's the form data body being uploaded, with the binary body snipped for clarity:

----------------------------9e69c35fcac84bf98479c6f8f11ada41
Content-Disposition: form-data; name="sha256"
000000000000000
----------------------------9e69c35fcac84bf98479c6f8f11ada41
Content-Disposition: file; name="file"; filename="autohubtest2-collection_dep_a_gxybuxwu-1.0.0.tar.gz"
Content-Type: application/octet-stream

...binary data...                                                                                                                                             
----------------------------9e69c35fcac84bf98479c6f8f11ada41--

and the 200 response body i get back

{'id': 'aac5f697-b7d5-4220-b1bb-ab91f6ea3a48', 'state': 'completed', 'created_at': '2020-02-14T19:06:08.757951Z', 'updated_at': '2020-02-14T19:06:09.603519Z', 'started_at': '2020-02-14T19:06:09.008671Z', 'finished_at': '2020-02-14T19:06:09.603448Z', 'error': None, 'messages': [{'time': 1581707169.380711, 'level': 'INFO', 'message': 'Content search - Analyzing collection structure'}]}

Is it possible your local setup is doing something different than CI here? Maybe a parameter is being stripped between our layer and the pulp layer underneath?

In any case, can you try on your end against CI?

alikins commented 4 years ago

@ironfroggy Could it be related to the file form-data Content-Type? Checking mitmproxy, for a request with valid sha256:

HTTP_PROXY=localhost:8088 http --all --verbose -f POST http://localhost:5001/api/automation-hub/v3/artifacts/collections/ file@/home/adrian/src/testcollections/collection-releases/alikins-collection_inspect-0.0.124.tar.gz sha256=bae607c68fef0bb25e4c81b6bae4002d0f61daaf9568eb442d504a9248c24348

mitmproxy is showing the request body as:

--7a803bb8b1894c228726005ed2b2af01
Content-Disposition: form-data; name="sha256"
bae607c68fef0bb25e4c81b6bae4002d0f61daaf9568eb442d504a9248c24348
--7a803bb8b1894c228726005ed2b2af01
Content-Disposition: form-data; name="file"; filename="alikins-collection_inspect-0.0.124.tar.gz"
Content-Type: application/x-tar; charset=gzip
<the binary goo of a gzipped tarball>

ie, the Content-Type: application/x-tar; charset=gzip bits that 'httpie' is including as part of it's cli file@ support.

Could the test cases be missing that? I suspect that may not be required per the spec, so requests without it may have slipped past the pulp tests (especially considering most one off testing is likely to be httpie, curl, or browser which may do that automatically)

ironfroggy commented 4 years ago

@alikins those headers don't match what ansible-galaxy itself is doing when publishing to the API. I've reproduced my test by modifying the CLI itself to send an incorrect sha256 with an otherwise unchanged payload and, like previous tests, the incorrect hash is ignored and has no impact on the acceptance of the payload.

I have to recommend that you try reproducing this:

  1. Based on the payloads coming from the CLI already in use, we have to support the payloads coming out of the already released tooling.
  2. Against the CI server, as I still think there's a discrepancy in testing against your development environment.
alikins commented 4 years ago

@ironfroggy Looking at https://tools.ietf.org/html/rfc7578#section-4.4, it mentions that each multi-part MAY provide a Content-Type and that 'application/octet-stream' is a valid content type. Implying that request shown in https://github.com/ansible/galaxy-dev/issues/246#issuecomment-586427152 is valid.

Which is to say, if having 'application/octet-stream' is causing problems, then it is definitely a bug somewhere.

alikins commented 4 years ago

@ironfroggy Yup, I can reproduce with a tweaked ansible-galaxy thats send a bogus sha256. In that case the import continues and is succesful while it should indeed fail.

alikins commented 4 years ago

Bogus sha: f666666666c34d9737b5d85d80f53d7b9a39f024ae3ae8f28f2588902bac019f actual sha: 3ee8d4b1d59d2ba17d291209c07e5da57f428bacc2cc928c76279588456c7b13

Info about the request from ansible-galaxy

POST http://localhost:5001/api/automation-hub/v3/artifacts/collections/ HTTP/1.1
Accept-Encoding:    identity
Host:   localhost:5001
User-Agent: ansible-galaxy/2.9.5.post0 (Linux; python:3.7.6)
Content-Type:   multipart/form-data; boundary=--------------------------72ba452a8a8447a79f5fcaf73fb9d6eb
Content-Length: 13898
Authorization:  Bearer { VERY LARGE AUTH TOKEN HERE }
Connection: close
x-rh-identity:  { SLIGHTLY SMALLER AUTH INFO HERE}
----------------------------72ba452a8a8447a79f5fcaf73fb9d6eb
Content-Disposition: form-data; name="sha256"
f666666666c34d9737b5d85d80f53d7b9a39f024ae3ae8f28f2588902bac019f
----------------------------72ba452a8a8447a79f5fcaf73fb9d6eb
Content-Disposition: file; name="file"; filename="alikins-collection_inspect-0.0.131.tar.gz"
Content-Type: application/octet-stream
\x1f\x8b\x08\x08\x0e\x00X^\x02\xffalikins-collection_inspect-0.0.131.tar\x00\xed}
{ LOTS OF BINARY BITS HERE}
alikins commented 4 years ago

The main diffs I see between the ansible-galaxy requests and a 'httpie' formed request:

ansible-galaxy

Content-Disposition: file; name="file"; filename="alikins-collection_inspect-0.0.136.tar.gz"
Content-Type: application/octet-stream

httpie (http --all --verbose -f POST http://localhost:5001/api/automation-hub/v3/artifacts/collections/ file@/home/adrian/src/testcollections/collection-releases/alikins-collection_inspect-0.0.136.tar.gz sha256=4f09b08e8ac62b285ebd1e74ceed8a3075177757d23c5db51f1bb97496162a98 for ex):

Content-Disposition: form-data; name="file"; filename="alikins-collection_inspect-0.0.136.tar.gz"
Content-Type: application/x-tar; charset=gzip
alikins commented 4 years ago

I think this is boiling down to a missing newline after the

Content-Disposition: form-data; name="sha256"

I think django/drf/the multipart spec wants a CRLFCRLF but ansible-galaxy is just CRLF, so the request body doesn't get parsed correctly and the sha256 field is never populated.

Adding the extra newline there makes it work. But that is a ansible-galaxy change. Not sure how to approach working around that (custom 'ansible-galaxy-multi-part-form-parser' I guess...)