gdcc / pyDataverse

Python module for Dataverse Software (dataverse.org).
http://pydataverse.readthedocs.io/
MIT License
64 stars 45 forks source link

direct upload to s3 store using Dataverse directupload api #136

Open jmjamison opened 3 years ago

jmjamison commented 3 years ago

I have been working with the directupload api (https://guides.dataverse.org/en/5.4/developers/s3-direct-upload-api.html) Its done in 2 passes. First puts the file into temp s3 storage, 2nd adds it to the dataset. As soon as I have a workable script I'll send it over. I'm a bit confused about the post request. Documentation shows: def post_request(self, url, data=None, auth=False, params=None, files=None): """Make a POST request. But if I set auth=True (because I'm using an api key) I get an error of: TypeError: 'bool' object is not callable

I checked my server log and found this:

|2021-04-19T19:43:25.360+0000|SEVERE|Payara 5.2020.6|javax.enterprise.web.core|_ThreadID=66;_ThreadName=http-thread-pool::http-listener-1(3);_TimeMillis=1618861405360;_LevelValue=1000;_MessageID=AS-WEB-CORE-00037;|

An exception or error occurred in the container during the request processing java.lang.Exception: Host is not set at org.glassfish.grizzly.http.server.util.Mapper.map(Mapper.java:865) at org.apache.catalina.connector.CoyoteAdapter.postParseRequest(CoyoteAdapter.java:496) at org.apache.catalina.connector.CoyoteAdapter.doService(CoyoteAdapter.java:309) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:238) at com.sun.enterprise.v3.services.impl.ContainerMapper$HttpHandlerCallable.call(ContainerMapper.java:520) at com.sun.enterprise.v3.services.impl.ContainerMapper.service(ContainerMapper.java:217) at org.glassfish.grizzly.http.server.HttpHandler.runService(HttpHandler.java:182) at org.glassfish.grizzly.http.server.HttpHandler.doHandle(HttpHandler.java:156) at org.glassfish.grizzly.http.server.HttpServerFilter.handleRead(HttpServerFilter.java:218) at org.glassfish.grizzly.filterchain.ExecutorResolver$9.execute(ExecutorResolver.java:95) at org.glassfish.grizzly.filterchain.DefaultFilterChain.executeFilter(DefaultFilterChain.java:260) at org.glassfish.grizzly.filterchain.DefaultFilterChain.executeChainPart(DefaultFilterChain.java:177) at org.glassfish.grizzly.filterchain.DefaultFilterChain.execute(DefaultFilterChain.java:109) at org.glassfish.grizzly.filterchain.DefaultFilterChain.process(DefaultFilterChain.java:88) at org.glassfish.grizzly.ProcessorExecutor.execute(ProcessorExecutor.java:53) at org.glassfish.grizzly.nio.transport.TCPNIOTransport.fireIOEvent(TCPNIOTransport.java:524) at org.glassfish.grizzly.strategies.AbstractIOStrategy.fireIOEvent(AbstractIOStrategy.java:89) at org.glassfish.grizzly.strategies.WorkerThreadIOStrategy.run0(WorkerThreadIOStrategy.java:94) at org.glassfish.grizzly.strategies.WorkerThreadIOStrategy.access$100(WorkerThreadIOStrategy.java:33) at org.glassfish.grizzly.strategies.WorkerThreadIOStrategy$WorkerThreadRunnable.run(WorkerThreadIOStrategy.java:114) at org.glassfish.grizzly.threadpool.AbstractThreadPool$Worker.doWork(AbstractThreadPool.java:569) at org.glassfish.grizzly.threadpool.AbstractThreadPool$Worker.run(AbstractThreadPool.java:549) at java.lang.Thread.run(Thread.java:748) |#]

Jamie Jamison UCLA Dataverse jamison@library.ucla.edu

skasberger commented 3 years ago

@jmjamison Which pyDataverse and Dataverse versions are you working on? And can you also share the code executed for the POST request?

jmjamison commented 3 years ago

Dataverse: 5.3 build 286-fcb5ce7 pyDataverse: 0.3.1

jmjamison commented 3 years ago

import pyDataverse from pyDataverse.api import NativeApi api = NativeApi(dataverse_server, api_key) <- set earlier import subprocess as sp from requests import ConnectionError, Response, delete, get, post, put resp = api.get_info_version() resp.json()

{'status': 'OK', 'data': {'version': '5.3', 'build': '286-fcb5ce7'}}

resp = requests.put(url_persistent_id, data=None, params=None, auth=(), files=None) resp.json()

{'status': 'ERROR', 'code': 405, 'message': 'API endpoint does not support this method. Consult our API guide at http://guides.dataverse.org.', 'requestUrl': 'https://dataverse.ucla.edu/api/v1/datasets/:persistentId/uploadurls?persistentId=doi:10.25346/S6/T4LHZF&size=10000000', 'requestMethod': 'PUT'}

Also tried: url_persistent_id = '%s/api/datasets/:persistentId/uploadurls?persistentId=%s&size=%s' % (dataverse_server, persistentId, str(size)) r = requests.post(url_persistent_id, headers={ "X-Dataverse-key": "$API_TOKEN" }, cookies={}, auth=() )

{'status': 'ERROR', 'code': 405, 'message': 'API endpoint does not support this method. Consult our API guide at http://guides.dataverse.org.', 'requestUrl': 'https://dataverse.ucla.edu/api/v1/datasets/:persistentId/uploadurls?persistentId=doi:10.25346/S6/T4LHZF&size=10000000', 'requestMethod': 'POST'}

jmjamison commented 3 years ago

Is there anything else I should add?

skasberger commented 3 years ago

@jmjamison Is this still an issue / problem? Am on parental leave until may 2022, so my time for pyDataverse is very, very limited.

jmjamison commented 3 years ago

Apologies, I didn't realize you were on parental leave. The issue exists but I can use other methods for direct uploads. Enjoy the time with your youngster.

skasberger commented 2 years ago

Update: I left AUSSDA, so my funding for pyDataverse development has stopped.

I want to get some basic funding to implement the most urgent updates (PRs, Bug fixes, maintenance work). If you can support this, please reach out to me. (www.stefankasberger.at). If you have feature requests, the same.

Another option would be, that someone else helps with the development and / or maintenance. For this, also get in touch with me (or comment here).

qqmyers commented 1 year ago

FWIW: There was some recent work on python support for direct upload in https://github.com/IQSS/dataverse.harvard.edu/pull/194 - not multipart yet and not associated with pydataverse but possibly useful and possibly something to mine for pyDataverse.

pdurbin commented 7 months ago

As discussed during the 2024-02-14 meeting of the pyDataverse working group, we are closing old milestones in favor of a new project board at https://github.com/orgs/gdcc/projects/1 and removing issues (like this one) from those old milestones. Please feel free to join the working group! You can find us at https://py.gdcc.io and https://dataverse.zulipchat.com/#narrow/stream/377090-python

pdurbin commented 7 months ago

p.s. see https://github.com/gdcc/python-dvuploader