[!IMPORTANT]
Based on an assumption that the file sizes of uploaded files may be... anything
AS A Data Analyst
I WANT TO use external files as input for data classification
SO THAT I would have the flexibility in choosing input for further processing
Acceptance Criteria
[ ] Source of file to be imported can be local computer
[ ] Using an external URL as source to import files is out of scope of this issue
[ ] Datasets can only be imported to pre-defined dataset groups - see #83
GUI
[ ] GUI to import external datasets
DSL
Transporting files
[ ] S3 Ferry is used to transfer files from source to S3 bucket
[ ] S3 Ferry is used to transfer files from S3 bucket to destination
[ ] CronManager is used to validate the state of current file upload and initiate appropriate triggers
E.g. to update the state of upload status, trigger moving file from S3 bucket to destination, and others, if necessary
The goal is not to overwhelm Ruuter by allocating its resources by processing large files
Endpoints
[ ] RESTPOST endpoint /classifier/datasets/accept to import new datasets
[ ] RESTPOST endpoint /classifier/datasets/status to change the status of importing a dataset in a Postgres database
[ ] RESTGET endpoint /classifier/datasets/settings to fetch settings when importing dataset
Supported data formats
[ ] ~Input CSV files for further processing by the Classifier~
[ ] Input XLSX files for further processing by the Classifier
[ ] Input JSON files for further processing by the Classifier
[ ] Input YAML files for further processing by the Classifier
AS A Data Analyst I WANT TO use external files as input for data classification SO THAT I would have the flexibility in choosing input for further processing
Acceptance Criteria
GUI
DSL
Transporting files
S3 Ferry
is used to transfer files from source toS3 bucket
S3 Ferry
is used to transfer files fromS3 bucket
todestination
CronManager
is used to validate the state of current file upload and initiate appropriate triggersEndpoints
REST
POST
endpoint/classifier/datasets/accept
to import new datasetsREST
POST
endpoint/classifier/datasets/status
to change the status of importing a dataset in a Postgres databaseREST
GET
endpoint/classifier/datasets/settings
to fetch settings when importing datasetSupported data formats
CSV
files for further processing by the Classifier~XLSX
files for further processing by the ClassifierJSON
files for further processing by the ClassifierYAML
files for further processing by the Classifier