VertaAI / modeldb

Open Source ML Model Versioning, Metadata, and Experiment Management
Apache License 2.0
1.7k stars 285 forks source link

Creating local dataset versions not working as expected #2717

Open Aid91 opened 2 years ago

Aid91 commented 2 years ago

Hi,

Currently I am using the open source version of the modelDB, with the latest docker images for all components:

When I try the basic local dataset versioning, no metadata about the files/directories is shown in the frontend, and probably because of the same reason no increments in data versions are possible (always a data version of 1 is returned).

Code example:

from verta import Client
from verta.dataset import Path
import os

client = Client("http://localhost:3000")
proj = client.set_project("Test project", desc="Test project")
expt = client.set_experiment("Test experiment", desc="Test experiment")

run = client.set_experiment_run(desc="Test experiment run", attrs={})
dataset = client.set_dataset(name="Test dataset")
dataset_version = dataset.create_version(Path("data.csv"))

Result:

connection successfully established
got existing Project: Test project
got existing Experiment: Test experiment
created new ExperimentRun: Run 551637130906217477
created new Dataset: Test dataset in workspace: personal
created new Dataset Version: 1 for Test dataset

When I change the data.csv file and run the same code again I get again the dataset version 1 (no version increment):

created new Dataset Version: 1 for Test dataset

If I decrease the python client version to verta==0.15.* dataset versioning works again, but some methods like dataset.get_latest_version() throw an exception: HTTPError: 501 Server Error: Method ai.verta.modeldb.DatasetVersionService/getDatasetVersionById is unimplemented for url: ...

This leads to my final question: Is latest open source version of the ModelDB supporting local dataset versioning? If so, which component versions (modeldb-backend, modeldb-proxy, etc) and Python client version are compatible?

Thanks in advance!

convoliution commented 2 years ago

Hi @Aid91, thank you for your continued interest in ModelDB!

verta==0.16.0 did involve an overhaul in how dataset versions are captured, and our OSS platform may not fully support its operations. <0.16.0 would be the best bet for core functionality, though a few methods (such as get_latest_version()) may also be absent from OSS.

In the meantime, I shall file a ticket for us at Verta to follow up on.

Atharex commented 2 years ago

Hi @convoliution

I am seeing a similar error and am interested to know, if there will be any new OSS releases past 2.0.8.1?

I've tried building the server-side components from the master branch several times, but the builds never succeeded

Atharex commented 2 years ago

Also asking because the 2.0.8.1 release contains a vulnerable log4j version and would really need an update https://github.com/VertaAI/modeldb/blob/v2.0.8.1/backend/pom.xml#L22