IQSS / dataverse

Open source research data repository software
http://dataverse.org
Other
875 stars 484 forks source link

32-bit limits file uploads to 2GB? #4125

Closed rsteans closed 6 years ago

rsteans commented 7 years ago

Hi y'all,

It is definitely possible that we're the only ones having this issue, but due to the limits of 32-bit, we've been unable to use the interface of dataverse to upload files larger than 2 GB. Tar'd files, etc... that need extraction are even smaller.

https://en.wikipedia.org/wiki/2_GB_limit

Mentioned this on Google Groups and Phil suggested I log it as a ticket.

rsteans commented 7 years ago

Let me be clear - I think we decided that this was the reason we couldn't do better than 2GB. This may be incorrect.

pdurbin commented 7 years ago

Thanks for opening this issue, @rsteans and yes, this is a follow on from https://groups.google.com/d/msg/dataverse-community/yMi4KHy-T00/2gtmUYrxAAAJ

My understanding is that you're running Dataverse 4.7.1 on AWS with files stored on EBS (Elastic Block Store). I'm not sure why you're hitting this 2 GB limit. I guess we're hoping that when you move to S3 (#3921), the limit will be gone?

rsteans commented 7 years ago

Phil,

That’s correct. We’re hoping that between upgrades and a move to S3 for storage we will either be able to upload more than 2GB at a time, or use an API to upload larger data.

I have always been baffled by this limitation, which I’d believed others were experiencing, so I’m curious to hear that we’re alone on this.

Ryan Steans Assistant Director, Texas Digital Library 512-495-4403

Web: http://www.tdl.org/

From: Philip Durbin [mailto:notifications@github.com] Sent: Wednesday, September 6, 2017 6:04 PM To: IQSS/dataverse dataverse@noreply.github.com Cc: Steans, Ryan J rsteans@austin.utexas.edu; Mention mention@noreply.github.com Subject: Re: [IQSS/dataverse] 32-bit limits file uploads to 2GB? (#4125)

Thanks for opening this issue, @rsteanshttps://github.com/rsteans and yes, this is a follow on from https://groups.google.com/d/msg/dataverse-community/yMi4KHy-T00/2gtmUYrxAAAJ

My understanding is that you're running Dataverse 4.7.1 on AWS with files stored on EBS (Elastic Block Store). I'm not sure why you're hitting this 2 GB limit. I guess we're hoping that when you move to S3 (#3921https://github.com/IQSS/dataverse/issues/3921), the limit will be gone?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/IQSS/dataverse/issues/4125#issuecomment-327632544, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AePgaPKAufhzkvx01PNtb_oPxkPViHL6ks5sfyB-gaJpZM4PPBvs.

pdurbin commented 7 years ago

@rsteans could it be that you're thinking about the :MaxFileUploadSizeInBytes setting? As it says at http://guides.dataverse.org/en/4.7.1/installation/config.html#maxfileuploadsizeinbytes , "If the :MaxFileUploadSizeInBytes is NOT set, uploads, including SWORD may be of unlimited size." So you might have set this to prevent uploads of unlimited size. #2169 is related but I don't want to throw too much at you at once.

4tikhonov commented 7 years ago

Well, DANS has the same problem with file limitation, we can upload only 5GB even with :MaxFileUploadSizeInBytes adjusted to 1TB. We think that bottleneck is proxy timeout (Apache) as we tested upload directly to glassfish by creating tunnel to our server and it did work even for 20-30Gb.

rsteans commented 6 years ago

Hi Phil,

We'll take a look at that setting next week and see if we can sort through this on our end. I'd like to make sure we're not setting anyone on a wild goose chase.

pdurbin commented 6 years ago

Sounds good. Thanks @rsteans . @4tikhonov I'm glad to hear you aren't affected. 30 GB is pretty big! Maybe you should try out the Data Capture Module ( http://guides.dataverse.org/en/4.7.1/installation/data-capture-module.html ) once it's farther along!

pdurbin commented 6 years ago

@rsteans any news? Also, I saw your tweet! https://twitter.com/TDLRyan/status/920835247826178048

djbrooke commented 6 years ago

I'm going to go ahead and close this one out.

TDL and DANS folks - @ccmumma and @4tikhonov - if y'all have some more info here and feel it needs deeper investigation, feel free to open it back up. Thanks!

pdurbin commented 6 years ago

I don't believe @CCMumma and @4tikhonov have permission to re-open issues and @rsteans has moved on (congrats! Check out the Dataverse mention at https://groups.google.com/d/msg/samvera-community/HMR1xK9JfmM/oxUVASKqAwAJ ) but they (or others) should definitely feel free to open a new issue if needed!

rsteans commented 6 years ago

I have moved on, but these emails are now following me to Northwestern, so let me know how I can help.

CCMumma commented 6 years ago

I think we'll know more once we upgrade to S3, which we plan to do with our next production upgrade once 5.0 is released. If we still see the issue, we'll re-open. Thanks, all, Courtney

pdurbin commented 6 years ago

4439 is pretty much the new version of this issue where we're continuing to work with TDL folks like @CCMumma .