ClearML - Auto-Magical CI/CD to streamline your AI workload. Experiment Management, Data Management, Pipeline, Orchestration, Scheduling & Serving in one MLOps/LLMOps solution
I'd like a new method to be added to Dataset so that we can add files to via a list of paths (as opposed to a directory or single path).
Motivation
With the structure of our data and the current implementation of add_files we are forced to either use symlinks to create a new directory of paths or call _add_files for each path we have. Both of these implementations are not performant so it would be useful if we could instead simply pass a list of paths to add to the dataset.
Related Discussion
If this continues a slack thread, please provide a link to the original slack thread.
Proposal Summary
I'd like a new method to be added to
Dataset
so that we can add files to via a list of paths (as opposed to a directory or single path).Motivation
With the structure of our data and the current implementation of
add_files
we are forced to either use symlinks to create a new directory of paths or call_add_files
for each path we have. Both of these implementations are not performant so it would be useful if we could instead simply pass a list of paths to add to the dataset.Related Discussion
If this continues a slack thread, please provide a link to the original slack thread.