IQSS / dataverse

Open source research data repository software
http://dataverse.org
Other
878 stars 490 forks source link

File Upload - Scalable upload options, AKA "dual-mode" #4610

Closed pameyer closed 6 years ago

pameyer commented 6 years ago

As a repository operator, I'd like to be able to support Dataverse native HTTP (browser and API) uploads and DCM uploads in the same installation. Similarly, I'd like to be able to support Dataverse native HTTP, S3 redirect (, swift?) download methods alongside RSAL downloads in the same installation.

mheppler commented 6 years ago

This includes:

screen shot 2018-05-30 at 12 32 24 pm

mheppler commented 6 years ago

Initial commit for UI prototype added to branch 4610-upload-dual-mode. Changes include:

Tasks to tackle in the next phase of development:

mheppler commented 6 years ago

This probably needs it's own issue, but the documentation for how to set up Dropbox for file upload is severely lacking. It took me far too long to find these required steps in the Dropbox documentation. We should link to that page from our guides to make it easier to set up. Currently, we provide no instructions.

pdurbin commented 6 years ago

3db4486 was so far develop that I couldn't even run the code so I merged the lastest in fb4e4f3. Heads up to @dlmurphy that I went with the version of @mheppler 's branch, which will undo fe59aef so you two should get on the same page about that tooltip.

dlmurphy commented 6 years ago

@pdurbin, I just talked with @mheppler and we're good with that merge. That tooltip will likely be further edited in the near future as part of the 5.0 redesign process anyway.

pdurbin commented 6 years ago

I took a quick look at pull request #4862 and noticed that there are merge conflicts in the bundle.

pdurbin commented 6 years ago

@sekmiller fixed the merge conflict in b5f81e3 and I offered to help with weaving this new feature into the guides somehow. He's going to take a look at the render logic for installations that haven't been configured with Dropbox or DCM. Right now it's all visible, including some placeholder text that @dlmurphy or others might want to weigh in on:

screen shot 2018-07-17 at 4 37 00 pm

Question for @pameyer (and others): Is it ok if it's not possible to turn off "native" or "traditional" file upload? I believe you have it turned off it your fork but it's not possible in the pull request as of this writing. I checked with @sekmiller

pameyer commented 6 years ago

Yes, it does make sense to retain the ability for different upload methods to be configurable (at minimum at the installation level, possibly at the dataverse level).

pdurbin commented 6 years ago

@pameyer ok, thanks. @sekmiller we can't just put !settingsWrapper.rsyncUpload back in to disable "Upload with HTTP via your browser". (Also, I just noticed that we need to move that English out of editFilesFragment.xhtml and into the bundle.) I guess we need a setting or something to disable "Upload with HTTP via your browser". @pameyer in your mind when "Upload with HTTP via your browser" is disabled does that mean that APIs that use HTTP are disabled as well? Or would that be a different setting?

pameyer commented 6 years ago

I think it's better to be consistent with APIs and HTTP UI (aka - if HTTP browser uploads are disabled, HTTP upload APIs should also be disabled).

pdurbin commented 6 years ago

@sekmiller and I spoke with @djbrooke @pameyer @landreev and @oscardssmith this morning. In 3ba0036 I adjusted the FileUploadMethods enum to rename "NATIVE" to "native/http" in preparation for making it possible to disable http upload for both GUI and API. @sekmiller and I believe it will be better to have "native/http" explicitly set as a database setting out of the box rather than having implicit logic in the code. So we'll be updating setup scripts to add it. I also added a TODO to add "native/dropbox" if we feel like it helps makes the code and user experience for sysadmins better. Some discussion about if it would be better to move the Dropbox key from a JVM option to a database setting. (I'd probably leave it alone but I don't have a strong opinion about this.)

pdurbin commented 6 years ago

Pull request #4862 is looking good, I think, so I moved it to QA. Heads up to @mheppler and @dlmurphy that in 4a7fcf5 I removed some help text that I didn't find very helpful. If someone can suggest better text, please advise and it can be put back in.

kcondon commented 6 years ago

Issues so far:

kcondon commented 6 years ago

Rsync is now working, refer to checklist above for issues.

Native and Dropbox work in basic UI testing. Could not get Rsync to work by following dcm_mock instructions. Actually noticed a server log error: Caused by: java.io.IOException: Dataset doi:10.5072/FK2/7L67Q5 is not locked for DCM upload batch-jobs log says, SEVERE: Dataset {0} is not locked for DCM upload. Exiting

Server log error:

[2018-07-20T18:35:19.966-0400] [glassfish 4.1] [SEVERE] [] [job-73] [tid: _ThreadID=342 _ThreadName=concurrent/__defaultManagedExecuto rService-managedThreadFactory-Thread-2] [timeMillis: 1532126119966] [levelValue: 1000] [[ Dataset doi:10.5072/FK2/VFIW5V is not locked for DCM upload. Exiting]]

[2018-07-20T18:35:19.966-0400] [glassfish 4.1] [WARNING] [] [com.ibm.jbatch.container.impl.JobThreadRootControllerImpl] [tid: _ThreadI D=342 _ThreadName=concurrent/__defaultManagedExecutorService-managedThreadFactory-Thread-2] [timeMillis: 1532126119966] [levelValue: 9 00] [[ Caught throwable in main execution loop with Throwable message: java.io.IOException: Dataset doi:10.5072/FK2/VFIW5V is not locked fo r DCM upload, and stack trace: com.ibm.jbatch.container.exception.BatchContainerRuntimeException: java.io.IOException: Dataset doi:10. 5072/FK2/VFIW5V is not locked for DCM upload at com.ibm.jbatch.container.artifact.proxy.JobListenerProxy.beforeJob(JobListenerProxy.java:45) at com.ibm.jbatch.container.impl.JobThreadRootControllerImpl.jobListenersBeforeJob(JobThreadRootControllerImpl.java:303) at com.ibm.jbatch.container.impl.JobThreadRootControllerImpl.originateExecutionOnThread(JobThreadRootControllerImpl.java:103) at com.ibm.jbatch.container.util.BatchWorkUnit.run(BatchWorkUnit.java:80) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at org.glassfish.enterprise.concurrent.internal.ManagedFutureTask.run(ManagedFutureTask.java:141) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) at org.glassfish.enterprise.concurrent.ManagedThreadFactoryImpl$ManagedThread.run(ManagedThreadFactoryImpl.java:250) Caused by: java.io.IOException: Dataset doi:10.5072/FK2/VFIW5V is not locked for DCM upload at edu.harvard.iq.dataverse.batch.jobs.importer.filesystem.FileRecordJobListener.beforeJob(FileRecordJobListener.java:177) at com.ibm.jbatch.container.artifact.proxy.JobListenerProxy.beforeJob(JobListenerProxy.java:43) ... 10 more ]]

[2018-07-20T18:35:19.966-0400] [glassfish 4.1] [SEVERE] [] [job-73] [tid: _ThreadID=342 _ThreadName=concurrent/__defaultManagedExecutorService-managedThreadFactory-Thread-2] [timeMillis: 1532126119966] [levelValue: 1000] [[ Job Failed. See Log for more information.]]

Need to test API enable/disable, mixed mode, rsync only mode. Need to follow up on above questions.

dlmurphy commented 6 years ago

Some notes from conversation about Add New Dataset page with Phil, Kevin, and Tania:

rsync allows you to run a script that transfers your files into or out of Dataverse via SSH. It is useful for extremely large files (>1GB), or packages containing a large number of files. Once you have saved this dataset, you can upload your data using rsync via the "Upload Files" button on the dataset page.

TaniaSchlatter commented 6 years ago

Sketch that hints at render logic for dropbox and rsync panels: img_6949

pdurbin commented 6 years ago

Replace the text in the rsync window

Fixed in 7452e66

Properly indent the messaging in the rsync window

Fixed in a867f5b

collapsible

My understanding is that there is still work to be done in this area. @djbrooke will do housekeeping on the issue.

djbrooke commented 6 years ago

Will discuss with @mheppler and other interested parties when he's back next week.

djbrooke commented 6 years ago

Thanks @sekmiller and @mheppler for working together to generate a checklist of remaining items here. I'll be happy to help with questions of scope, etc.

mheppler commented 6 years ago

Talked to @sekmiller. Outline of cleanup:

pdurbin commented 6 years ago

At standup this morning I volunteered to deploy the code to the dev1 server in various configurations:

As of ec8aa56, native + dropbox only has been deployed: https://dev1.dataverse.org

mheppler commented 6 years ago

Latest and greatest to-do checklist based on review of the new feature summary doc with @pameyer and @pdurbin and @dlmurphy.

Still to consider, which might ultimately get added to the to-do list:

dlmurphy commented 6 years ago

Document with in-app messaging for this feature: https://docs.google.com/document/d/14mkaH_YW1jP-xlFMtLuW-xDuAx_3g3FbIuVJ_0YhLAE/edit

My new revisions to the messaging are highlighted in green.

Next step for me is to work on the documentation.

dlmurphy commented 6 years ago

I just committed the bulk of the Dual Mode documentation, but on Monday morning I want to briefly check around the rest of the guides and make sure other sections about rsync are still accurate.

pdurbin commented 6 years ago

I'm ready for code review as of cf041d5 so I dragged this issue over in https://waffle.io/IQSS/dataverse

dlmurphy commented 6 years ago

Hold up, I have one more doc commit to make, then I'm ready too.

dlmurphy commented 6 years ago

OK, docs are done too now, ready for code review.

pdurbin commented 6 years ago

@matthew-a-dunlap I addressed your code review feedback in 65407cd . Please check it out and let me know if there's anything else.

kcondon commented 6 years ago
mheppler commented 6 years ago

Discussed briefly with @sekmiller and @kcondon and here is what I have as the outstanding issues:

Happy to discuss in more detail if needed to get this to done.

kcondon commented 6 years ago
kcondon commented 6 years ago

rsync upload, unpublished Message should use email address rather than the name of the root dataverse per DMurphy:

kcondon commented 6 years ago
kcondon commented 6 years ago
kcondon commented 6 years ago

rsync upload, published

kcondon commented 6 years ago

rsync upload, published

pameyer commented 6 years ago

On c7c6b7e

DCM only mode:

Dual mode:

Initial thinking is that it makes sense to defer metrics-related issues to a separate issue (which intersects with multiple storage locations).

kcondon commented 6 years ago

rsync upload, unpublished

Messages needs to be updated to use support email addresses:

pameyer commented 6 years ago

adb73a4000f42a8ffca8e8dbb24754a31e14cd00 ; in DCM only mode

mheppler commented 6 years ago

As @pameyer suggested above, the button added for "Redownload DCM Script" when the dataset is locked for upload, needs some attention. This issue was moved back into develop in order to solve this. The button currently links to the File Upload pg which is the same page the disabled "Upload Files" btn links to.

In an attempt to deliver something closer to what was initially suggested, I have managed to get this Frankenstein looking button and message block solution mocked up. (It is only mocked up because when you click on the button you don't get the script, but instead the page refreshes and the dataset lock warning msg is no longer displayer.)

screen shot 2018-09-05 at 1 48 21 pm

Will be working to get something similar to this solution, or a text link -- if possible, but there were concerns about the onclick nature of this link, as opposed to a URL you can navigate to.

mheppler commented 6 years ago

After reviewing options with the team, it was determined the easiest solution would be to make the "Download DCM Script" button on the dataset pg actually download script, instead of link to the Upload Files pg. (What made it even easier was that the DatasetPage.java bean already had the DatasetPage.downloadRsyncScript() code in it!)

screen shot 2018-09-05 at 3 37 08 pm

mheppler commented 6 years ago

Moving back to QA with the blessing of @pameyer. Will address his last "jumping up and down" UX issue in another issue or later in this QA cycle.

pameyer commented 6 years ago

On 92ec40559e37dfd3c795f15b0732ff4165de460a

big-data mode:

dual-mode:

screen shot 2018-09-18 at 5 48 40 pm

pameyer commented 6 years ago

A little more investigation on the mixed dataset; native upload APIs are blocked when a DCM upload lock is present, but not blocked when there's already a package file in the dataset.

sekmiller commented 6 years ago

checked in fixes for both of @pameyer 's notes above. moving back to QA assigning back to Pete.

pameyer commented 6 years ago

@sekmiller Thanks; fixes look good.

It looks like I've run out of ways to break this; the closest I came was inconsistent UI state:

one way native upload can be inactive screenshot from 2018-09-20 17-39-36

the other way native upload can be inactive screenshot from 2018-09-20 17-44-31

There were suggestions from the UI review yesterday (https://docs.google.com/document/d/1Cshqx10BnYKy8HWEPFgI7Dho8ObXSN8zo48mP3t9Egk); @djbrooke are we considering those (chevron icons; red error -> yellow warning; link in "please contact support") in-scope for this issue, or splitting them off?

djbrooke commented 6 years ago

Thanks @pameyer - we should do those suggestions as part of this issue.

@dlmurphy can you put together a checklist from the notes doc and add it here?

dlmurphy commented 6 years ago

Checklist from UX Review:

mheppler commented 6 years ago

Added chevron icons to upload method headers. Fixed warning msg styling. Added new render logic for missing msgs when HTTP file is added. de33aec8f88981ffcea0f7f0286326dea99165eb

screen shot 2018-09-24 at 2 54 25 pm

mheppler commented 6 years ago

Had to remove !empty EditDatafilesPage.fileMetadatas render logic from the dataTable to get the "There are no selected files to display." empty table msg on the Edit Files pg to display again. Unfortunately that also return the "Files you upload will appear here." empty table msg on the Create Dataset and Upload Files pgs.

screen shot 2018-09-24 at 3 50 44 pm screen shot 2018-09-24 at 3 51 09 pm