gdcc / pyDataverse

Python module for Dataverse Software (dataverse.org).
http://pydataverse.readthedocs.io/
MIT License
63 stars 41 forks source link

upload_datafile: handling of the content (mime) type #142

Open landreev opened 2 years ago

landreev commented 2 years ago

Any change needs to be discussed before proceeding. Failure to do so may result in the rejection of the pull request.

All Submissions

Describe your environment

Follow best practices

Describe the PR

There is currently no way to pass the content (mime) type to upload_datafile() (see #118). Also, when the multi-part POST form is created inside the method, NO content type is specified for the upload. This apparently fools Dataverse into defaulting to "text/plain", without attempting to use its normal type detection methods. In other words, in its current form, all files uploaded via pyDataverse end up with the content type "text/plain". Even when they are of types normally recognized by Dataverse (popular image types, etc). This defaulting behavior can and should be addressed on the Dataverse side. But it should be a good idea to fix it on the pyDataverse side as well. So this PR does 2 things:

  1. Provides a way to supply the mime type explicitly; and
  2. Makes it default to the standard application/octet-stream - a polite way to say "type unknown" - when creating a multi-part POST entry, like curl does; which then prompts Dataverse to at least attempt to identify the file more accurately. This is achieved by switching to the long notation of passing the file to the requests.post method: from {"file": open(filename, "rb")} to {"file": (filename, open(filename, "rb"), content_type)}.

On the Dataverse side this is tracked in https://github.com/IQSS/dataverse/issues/8344

Testing

Commits

Others

Documentation contribution

Code contribution

codecov[bot] commented 2 years ago

Codecov Report

Merging #142 (628dbb4) into develop (cc06022) will not change coverage. The diff coverage is 50.00%.

Impacted file tree graph

@@           Coverage Diff            @@
##           develop     #142   +/-   ##
========================================
  Coverage    50.91%   50.91%           
========================================
  Files            5        5           
  Lines         1316     1316           
========================================
  Hits           670      670           
  Misses         646      646           
Impacted Files Coverage Δ
src/pyDataverse/api.py 21.48% <50.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update cc06022...628dbb4. Read the comment docs.

skasberger commented 1 year ago

Update: I left AUSSDA, so my funding for pyDataverse development has stopped.

I want to get some basic funding to implement the most urgent updates (PRs, Bug fixes, maintenance work). If you can support this, please reach out to me. (www.stefankasberger.at). If you have feature requests, the same.

Another option would be, that someone else helps with the development and / or maintenance. For this, also get in touch with me (or comment here).

JR-1991 commented 4 months ago

@landreev thanks for the reminder in #142 and this PR! I think the addition of this PR makes sense and enhances the utility of the library. We have been able to resolve a similar issue encountered when replacing files with #174, wherein we have switched to another library that does not send text/plain by default.

Can you sync this PR to the current main branch in order to review it?

pdurbin commented 3 months ago

@landreev hey, @JR-1991 and I talked about this PR a couple weeks ago (recording) and wanted to let you know that merge conflicts are due to pyDataverse switching from requests to httpx in #174

We suspect #118 may be fixed already (I just left a comment about this) but we think the ability to send an arbitrary MIME type might be a nice feature to have and we'd welcome a pull request if you have the time.