bsc-dom / dataclay

Python distributed data store that enables remotely access and method execution.
https://dataclay.bsc.es/
BSD 3-Clause "New" or "Revised" License
8 stars 3 forks source link

ObjectMetadata is not fully available for COMPSs workers #11

Closed marcmonfort closed 1 year ago

marcmonfort commented 1 year ago

When executing tasks with COMPSs, the object.getID() and the storage.api.getById() are used.

Currently, the backend_id (what we called hint a while ago) and the object id are included, so the worker does not need to go to the metadata service to ask for information, and it is able to execute stuff and so on.

However, the ObjectMetadata is not fully populated (e.g. we don't know the dataset of the object, or its replicas, etc.). It feels wrong. Going to the metadata should be avoided (critical path and HPC, you know the drill) so maybe serializing the ObjectMetadata would be a solution. But that should be checked properly with COMPSs and Storage API integration.

At the moment, we have partial ObjectMetadata and should not be a problem for most use cases (because tasks do not typically need a lot of information about the object, and knowing where to execute active methods is enough in most cases).