datahubio / datahub-v2-pm

Project management (issues only)
8 stars 2 forks source link

Content-types for files from Cloudflare are different than from S3 #101

Closed zelima closed 6 years ago

zelima commented 6 years ago

Even though the content types of the pushed files on S3 are correct, when requesting them from Cloudflare the are changed to text/plain

Acceptance Criteria

Task

Analysis

We now are guessing content types for files before exporting to S3 see https://github.com/frictionlessdata/datapackage-pipelines-aws/commit/dc43747361374a06a4b34176f051be03cbc4bef7 though if you go and try downloading file nothing has changed. Eg

curl -s -I https://pkgstore-testing.datahub.io/core/cofog/cofog_csv/data/de69248ce512cf861bbf88cfea25bf41/cofog_csv.csv
...
HTTP/1.1 200 OK
Content-Type: text/plain
Content-Length: 88112
Server: cloudflare
...

It seems Cloudflare forcing to change content type to text/plain. Eg check content type for the same file directly from S3

curl -s -I https://s3.amazonaws.com/pkgstore-testing.datahub.io/core/cofog/cofog_csv/data/de69248ce512cf861bbf88cfea25bf41/cofog_csv.csv
HTTP/1.1 200 OK
...
Content-Type: text/csv
Content-Length: 88112
Server: AmazonS3

to make sure not all CSV on S3 files have text/csv checking old file (pushed as ''plain/text')

curl -s -I https://s3.amazonaws.com/pkgstore-testing.datahub.io/core/cash-surplus-deficit%3Acash-surp-def_csv/data/cash-surp-def_csv.csv
HTTP/1.1 200 OK
...
Content-Type: text/plain
Content-Length: 152134
Server: AmazonS3
zelima commented 6 years ago

WONTFIX. Due to security we do want to force content type to be plain/text