BlueBrain / Search

Blue Brain text mining toolbox for semantic search and structured information extraction
https://blue-brain-search.readthedocs.io
GNU Lesser General Public License v3.0
40 stars 11 forks source link

Test DVC pipelines of "data_and_model/" with CI #377

Open FrancescoCasalegno opened 3 years ago

FrancescoCasalegno commented 3 years ago

Currently, our CI is never testing the content of data_and_models/, so it is possible that e.g. some code changes in src/ will break data_and_models/ and we don't realize it.

It is not clear yet how the DVC pipelines could be tested.

pafonta commented 3 years ago

Additional context

See also the part on DVC elements in https://github.com/BlueBrain/Search/pull/351#issuecomment-843037012 about what one should do to deal with DVC while working on data_and_models/.

pafonta commented 3 years ago

Hello @FrancescoCasalegno,

Did we know about Studio, a tool from the people who made DVC and CML?

I have tried Studio (https://dvc.org/doc/studio). This user interface on top of DVC + CML is very interesting.

As we will train and tune more and more models, this could be very helpful.

Indeed, it let us manage DVC experiments and CML reports in a comprehensive and integrated way.

For example, all the experiments and plots I have done for #356 could have been compared, shared, and visualized in this tool.