gdcc / dvwebloader

A web tool for uploading folders of files to a Dataverse dataset
Apache License 2.0
1 stars 8 forks source link

Error message when trying to upload from Linux/Ubuntu/Chromium through Wi-fi #9

Closed philippconzett closed 1 year ago

philippconzett commented 1 year ago

@qqmyers @Louis-wr

Earlier today, I tried to upload some 5 GB files from Linux/Ubuntu/Chromium through Wi-fi, and got the following error messages:

Successful upload of part 811 of 820 fileupload2.js:292 Successful upload of part 820 of 820 fileupload2.js:292 Successful upload of part 812 of 820

...

fileupload2.js:292 Successful upload of part 817 of 820 fileupload2.js:292 Successful upload of part 818 of 820 fileupload2.js:292 Successful upload of part 819 of 820 fileupload2.js:340 reporting file data_0.bag dvwebloader.html:1 Access to XMLHttpRequest at 'https://test-docker.dataverse.no/api/datasets/mpupload?globalid=doi:10.21337/BGMM6J&uploadid=NTQ2OTAxMDIxNjY1NjM5NjQ1NTM1&storageidentifier=S3://2002-green-dataversenotest1:183cfdc010b-df234abd0e00' from origin 'https://gdcc.github.io' has been blocked by CORS policy: Response to preflight request doesn't pass access control check: No 'Access-Control-Allow-Origin' header is present on the requested resource. fileupload2.js:420 Failure: 0 fileupload2.js:421 Failure: test-docker.dataverse.no/api/datasets/mpupload?globalid=doi:10.21337/BGMM6J&uploadid=NTQ2OTAxMDIxNjY1NjM5NjQ1NTM1&storageidentifier=S3://2002-green-dataversenotest1:183cfdc010b-df234abd0e00:1 Failed to load resource: net::ERR_FAILED fileupload2.js:375 md5 done fileupload2.js:429 handling fileupload2.js:433 handling2 fileupload2.js:551 0 : 4 : 5 : 0

philippconzett commented 1 year ago

Maybe this is related to the problem Jamie reported in the Google group: https://groups.google.com/u/1/g/dataverse-community/c/Ex9-oMfcT7o?

Louis-wr commented 1 year ago

This is due to having too small size for multipart file upload in s3 direct upload. fixed by running : asadmin create-jvm-options "-Ddataverse.files.S3.min-part-size=1368709120"

not related to the problem in the google group. This can be closed

philippconzett commented 1 year ago

Thanks, @Louis-wr!

qqmyers commented 1 year ago

FWIW: That is the first PUT call back to the Dataverse server and it looks to me like a CORS issue. (Not the same as Jamie's which was CORS settings on the bucket.) There is an :AllowCors setting for Dataverse that is true by default. If that is set false, or if some proxy in front of Dataverse doesn't allow CORS, it could cause this issue.

The only way part size solves this is if you do as you have and set the part size larger than your file so you avoid using the multipart API call. That looses the efficiency/parallelism of using multipart. I'm a bit surprised you don't have troulbe with other calls but perhaps your CORS is set up to allow POST but not PUT.

Also - this issue is related to hosting the dvwebloader on another server. If it actually comes from your test-docker.dataverse.no machine, CORS wouldn't be involved.

Louis-wr commented 1 year ago

I don't know enough about CORS to answer why. This is a error I have seen many time when uploading a multipart file and one of the parts fails (usually due to too may small parts).

Thank you for your suggestion I will definitely look into it, maybe there lies the reason why there dose not seams to be a retry when a part fails. However uploading a multipart files works just fine if parts are kept to a more reasonable number (ie not 800)

qqmyers commented 1 year ago

Hmm - odd. The only thing I can see is when you have 1 part versus several. With any number of parts, you have to make an extra PUT call to tell Dataverse to tell the bucket to finish the multipart upload and create one object.