The File Upload Wizard, shortened as FUW from here on, is an Internet based software system to:
Help authors upload the files of a GigaDB dataset associated with a manuscript submitted to either GigaScience or GigaByte Journals
Help GigaDB curators manage the dataset file submission by authors
Enable access of the files of a pre-publication dataset to reviewers via the private dataset (mockup) page
Enable actors involved in the above workflow to receive email notifications of updates
Enable curation of the uploaded files into the GigaDB dataset for publication
Manage the IT resources involved in the above workflow
Key subsytems
The system has 6 pillars (or tent poles) that are integrated together:
The Uppy file uploader component (https://uppy.io), to batch upload files and associated metadata (generic and custom)
The open protocol for resumable file upload (https://tus.io) and its reference server (https://github.com/tus/tusd), because upload of many files and big files can take a very long time and failures can happen and there is need for resumability and post-processing
A message queue subsystem, Beanstalkd (https://beanstalkd.github.io) to allow commands triggered from the UI to be sent to backend workers that will execute jobs asynchronously (as batch upload and process of many files can take a very long time)
A list of stages in the dataset publishing workflow defined by Gigascience curators and editors defining a dataset state
The existing GigaDB website that provide the dashboard for the various actors involved to orchestrate their part of the workflow, and that provide user account management for authors and curators, as well as state management and viewing gallery for datasets
How it works
The UI for the workflow is centered on a Vue.js 2 application that's embedded within the GigaDB website that communicate to a REST API (deployed as two standalone Yii 2 applications called fuw-backend and fuw-public) and that send jobs to a message queue. Uppy is a third party component that fulfills the main functionality - resumable file upload - in tandem with the backend server Tusd (both Uppy and Tusd are from the same company). Additional third party components comes from the component library Element-UI and are used for some UI interactions.
By the time this project is to start (preferred to be 2nd January 2024), the FUW system would be re-enabled in the same state as when it was disabled and as shown in the video above , and behind a proper feature flag.
The work the tech team needs to perform, driven by curators' feedback and by observations from tech team during the current work for re-enabling the system, is listed as:
Update how the system work to align with current practices of publishing workflow and of data curation and to paliate delayed integration with manuscript system
Add behaviours to the system to handle error or conflict scenarios, like making all subsystems idempotent (what if drop-boxes already exists, what if dataset already exists,...) or if users tries to upload files too big
Make the system production-ready by fixing security issues, usability issues and out-dated dependencies
Adapt the Linux file system integration to work with Amazon EFS and Wasabi object storage
Adapt the system to a multi-servers deployment (for security, redundancy and performance reasons)
Ensure the completion or errors of all backend jobs are eventually notified to the users who triggered them
Ensure housekeeping/reset tools are finished and work
Update the inline help, the system documentation, and the diagrams linked above
Run load testing and performance testing at each iteration towards the goal
Conduct deliberate and organised user testing of the system
The frontend effort to accomplish the above work is listed as:
Redevelopment of File Upload Wizard
Rationale
The File Upload Wizard, shortened as FUW from here on, is an Internet based software system to:
Key subsytems
The system has 6 pillars (or tent poles) that are integrated together:
How it works
The UI for the workflow is centered on a Vue.js 2 application that's embedded within the GigaDB website that communicate to a REST API (deployed as two standalone Yii 2 applications called fuw-backend and fuw-public) and that send jobs to a message queue. Uppy is a third party component that fulfills the main functionality - resumable file upload - in tandem with the backend server Tusd (both Uppy and Tusd are from the same company). Additional third party components comes from the component library Element-UI and are used for some UI interactions.
The current (and initial) implementation of the system is illustrated in this user-centric video and it implements the stages described in this user workflow diagram.
The code and infrastructure for the system is already in gigadb-website repo but has been disabled two years ago.
The architecture of the system is described in this architecture diagram.
Work to do
By the time this project is to start (preferred to be 2nd January 2024), the FUW system would be re-enabled in the same state as when it was disabled and as shown in the video above , and behind a proper feature flag.
The work the tech team needs to perform, driven by curators' feedback and by observations from tech team during the current work for re-enabling the system, is listed as:
The frontend effort to accomplish the above work is listed as: