bacalhau-project / bacalhau

Compute over Data framework for public, transparent, and optionally verifiable computation
https://docs.bacalhau.org
Apache License 2.0
641 stars 85 forks source link

Data Catalog for Bacalhau #4187

Open wdbaruni opened 3 days ago

wdbaruni commented 3 days ago

Summary: Implement a Data Catalog feature to allow compute nodes to publish metadata about the data they have or can access. This will enable users to submit jobs by defining the data they want to access, and the system will route the job to the most suitable nodes based on data access capabilities, proximity, and cost.

Description: In a distributed computing environment, efficiently managing and accessing data is crucial. This feature aims to create a Data Catalog that indexes metadata about available data across the Bacalhau network. The catalog will facilitate job submissions by enabling users to specify the data they need, and the system will automatically select the optimal nodes for job execution.

Key Features:

Benefits:

Integration:

wdbaruni commented 3 days ago

For consideration https://github.com/bacalhau-project/bacalhau/issues/1010#issue-1434555585

aronchick commented 2 days ago

HOLY cow this would be great