VertaAI / modeldb

Open Source ML Model Versioning, Metadata, and Experiment Management
Apache License 2.0
1.7k stars 283 forks source link

What gets stored in Database and artifact store? #1590

Open adhikari23 opened 3 years ago

adhikari23 commented 3 years ago

As it is mentioned in your documentation that Database ModelDB Backend stores the information from the requests it receive into a Relational database. Out of the box ModelDB is configured and verified to work against PostgreSQL. Volumes : The relational database and the artifact store in backend need volumes attached to enable persistent storage.

What gets stored in Database and what exactly goes into Artifact store? Also, why do we need to two storages ( Database & Artifact store )? Is it possible that we have two logical instance of single Database to serve the purpose?

ravishetye commented 3 years ago

Hey @adhikari23 , one rough way to think of things are things which ModelDB interprets are stored in the database and the things it does/should not are stored in the artifact store.

Most of the metadata related to Projects, Experiment, ExperimentRun, Dataset, DatasetVersion, Repository, Commits are stored in the Database. This metadata includes but is not limited to the name, description, creation and updating time, tags, metrics, hyperparameters, observations, etc.

The Artifactstore is meant to store large BLOB/ CLOB. ArtifactStore can store the actual dataset which the user logs, any artifacts which need to be associated with the experiment run (for example an image of a chart which the user wants to associate with the experiment run), the actual model, any files that. may be required to be tracked to get the model running (for example : requirements.txt).

The client APIs which would lead the backend to interact with the artifact store can be found here