cvat-ai / cvat

Annotate better with CVAT, the industry-leading data engine for machine learning. Used and trusted by teams at any scale, for data of any scale.
https://cvat.ai
MIT License
12.11k stars 2.94k forks source link

Backup not working following the guides #7983

Open mmoyano-sigmacognition opened 2 months ago

mmoyano-sigmacognition commented 2 months ago

Actions before raising this issue

Steps to Reproduce

Hi,

I need to upgrade as soon as possible my CVAT version (actually working in 2.11.2) to the newest one. Before doing it, I've tried to backup the data following the guides. I run these commands:

$ mkdir backup $ docker run --rm --name temp_backup --volumes-from cvat_db -v $(pwd)/backup:/backup ubuntu tar -czvf /backup/cvat_db.tar.gz /var/lib/postgresql/data $ docker run --rm --name temp_backup --volumes-from cvat_server -v $(pwd)/backup:/backup ubuntu tar -czvf /backup/cvat_data.tar.gz /home/django/data $ docker run --rm --name temp_backup --volumes-from cvat_elasticsearch -v $(pwd)/backup:/backup ubuntu tar -czvf /backup/cvat_events.tar.gz /usr/share/elasticsearch/data

This threw the following error:

tar: Removing leading `/' from member names
tar: /usr/share/elasticsearch/data: Cannot stat: No such file or directory
tar: Exiting with failure status due to previous errors

How can I make the backup? Are the three commands actually updated to work with the new versions?

Also, I'll take the opportunity to ask how can I migrate the data from one previous version to the newest one.

Thanks in advance, Manuel

bsekachev commented 2 months ago

@azhavoro may you please look at the issue?

ccampbell9 commented 2 months ago

@bsekachev I think the problem is that the backup guide documentation has not been updated since elasticsearch was replaced by clickhouse. I am similarly trying to create a backup and the only reference I can find to elasticsearch is in the changelog.md for version 1.3.0.

@mmoyano-sigmacognition You might be able to backup the events by doing the following, but I recommend spinning up a fresh CVAT instance before deleting your existing one and attempting a restore first to confirm that you are able to recreate the events data:

docker run --rm --name temp_backup --volumes-from cvat_clickhouse -v $(pwd)/backup:/backup ubuntu tar -czvf /backup/cvat_events.tar.gz /var/lib/clickhouse/

It might also be a good idea to back up the cvat_vector and cvat_grafana services' volumes too because per the release notes for version 2.4.0 (the first reference to clickhouse I was able to find):

"Improved analytics and enriched it with the following features: log collection from UI and server, exceptions collection, user activity visualization, aggregation of user and job working time, and log filtering for debugging purposes. Analytics are now based on ClickHouse + Vector + Grafana."

To create an archive for those services' volumes run the following two commands:

docker run --rm --name temp_backup --volumes-from cvat_vector -v $(pwd)backup:/backup ubuntu tar -czvf /backup/cvat_vector.tar.gz /etc/vector/vector.toml

And

docker run --rm --name temp_backup --volumes-from cvat_grafana -v $(pwd)backup:/backup ubuntu tar -czvf /backup/cvat_grafana.tar.gz /var/lib/grafana/dashboards/

Those last two might not be necessary, but it can't hurt. I think the grafana one specifically would be saving any dashboard settings you may have configured, but that is just speculation on my part and I have no idea what the vector.toml volume is storing and whether or not it is required to properly restore events data. Maybe one of the devs can chime in as I am by no means an expert, but I am quite confident cvat_elasticsearch is a deprecated component no longer used in the recent releases.