Can't duplicate dataset

shawnyama commented 1 year ago

Problem

In this PR I am trying to duplicate a dataset within a project. I created the function copyDataset in dataset.ts to do so. I know to get rid of properties such as id and timestamp. However I had to remove the columns property for the call to work (I was using the endpoint for creating a dataset). Otherwise I get the following error in my TDS docker container:

2023-08-21 14:14:05 INFO: 192.168.65.4:60468 - "POST /datasets HTTP/1.1" 422 Unprocessable Entity

Swagger attempt:

Task

It'd probably be ideal to make a TDS endpoint just for copying a dataset. I can just send the id of the dataset I want to copy and the new name to it. I could send the project id to it as well so the copied dataset can be added to the project. Let me know if you'd prefer me to add it to the project using a separate call or if your architecture is cool with doing it all in one.

toddroper commented 1 year ago

@shawnyama The swagger error is related to the file_names array, does it succeed when you add that? The columns array should be a list of objects that contain the column data outlined in this class: https://github.com/DARPA-ASKEM/data-service/blob/main/tds/modules/dataset/model.py#L25

The columns are optional so removing the array is fine.

The real question comes down to where we want to do the s3 duplication. TDS has been architected to be a dumb service so it is agnostic to what the hmi client/server want to do with the data. If we want to change that design we can add a copy flag to the dataset creation endpoint and duplicate the file in s3. The asset question ties back into the continuing discussion on how we want to handle passing the project/user data to TDS. We can add logic based on this data (whether it's in the header or post body) to add the asset relationship when the project ID is present that way we only need the asset endpoint for explicit operations.

@YohannParis @brandomr Thoughts on this?

shawnyama commented 1 year ago

It's been decided that we kill the duplication feature

DARPA-ASKEM / data-service

Can't duplicate dataset #315

Problem

Task