ORNL / DataFed

A Federated Scientific Data Management System
https://ornl.github.io/DataFed/
Other
19 stars 14 forks source link

System - Refactor data storage location tracking #60

Open dvstans opened 5 years ago

dvstans commented 5 years ago

Currently the storage path of raw data is tracked in the core DB. This may lead to future issues with data repo maintenance,r load balancing, and object storage. Instead, the database should not store any information except for with repo the raw data is stored on. When starting transfers, the core should ask the repo for the path (for both the raw data and the metadata file).

dvstans commented 2 years ago

Need to research / discuss S3 storage under Globus and whether we need to support it. If not, the current approach is fine and should be left alone.

JoshuaSBrown commented 2 years ago

Low Priority If we want to include additional storage systems i.e. S3 (Object store) You cannot open it and or read it once in an object store. Checksumming on an object store is expensive.

The core server makes assumptions that the repo server is a POSIX file system. This is a problem of separation of concerns.

JoshuaSBrown commented 1 year ago

To clarify the problem repository-specific configuration should stay on the machine where the repository exists.