IQSS / dataverse

Open source research data repository software
http://dataverse.org
Other
878 stars 489 forks source link

Inconsistency in Content-Disposition format #7962

Open Xarthisius opened 3 years ago

Xarthisius commented 3 years ago

Dataverse returns Content-Disposition header for a file in two different formats depending on where the file is physically stored. E.g.:

import requests

url = "https://dataverse.harvard.edu/api/access/datafile/3315157"
req = requests.get(url, allow_redirects=True)
print(f"File (id:3315157) - tab version not on S3: {req.headers['Content-Disposition']}")
req = requests.get(url + "?format=original", allow_redirects=True)
print(f"File (id:3315157) - original version on S3: {req.headers['Content-Disposition']}")

yields:

File (id:3315157) - tab version not on S3: attachment; filename="Karnataka_DD%26FS_Data-1.tab"
File (id:3315157) - original version on S3: attachment; filename*=UTF-8''Karnataka_DD%26FS_Data-1.xlsx

Note that in both cases filename has escaped and encoded characters per RFC 5987. However, only the latter format of the header is the correct ("filename*" indicates that name is encoded, see e.g. Content-Disposition def) Related PRs: #4542 and #7503

pdurbin commented 2 years ago

@Xarthisius thanks for all this analysis. Do you think this is related to the following issue?

poikilotherm commented 2 years ago

My comment here is related to this IMHO.