This repository contains codebase for the competition as well as setup for experiments version control for "UCU dropouts" team
pip
, venv
modules available
On the fresh systems these things might be absentFirst of all, you will need to initialize your own .env
file, create python virtual environment and install necessary packages:
python3 -m venv venv
source venv/bin/activate
pip install -e . --no-deps
pip install -r requirements.txt
To train locally, on your own host machine, it's required to download kaggle competition dataset first.
It's suggested to store it in the dataset
folder.
To use kaggle cli
, you have to setup your credentials according to this instruction.
To download dataset use the following command:
kaggle competitions download -c vesuvius-challenge-ink-detection --force --path dataset
unzip dataset/vesuvius-challenge-ink-detection.zip -d dataset
Note: Takes 15 mins to download with speed 25 MB/s (UCU WIFI) and 27 mins to extract archives on HDD.
It's important to manage your env variables. The easiest way to set up all variables needed, is to create .env file:
cp .env-example .env
Note: The WANDB_API
entry with API key from your wandb account is not present in .env-exmaple
but is important for experiment tracking!
Work in progress The idea is to have a notebook with possibility to specify params and clone this repo and execute trainings with possibly some local changes. The credentials Good hint is to download the dataset to Google Drive to decrease time needed to initialize the dataset. The credentials might be stored as Google Drive file.
The kaggle notebook is available by the link and in (notebooks/kaggle_training_notebook.ipynb). Simply follow the instructions from the notebook to start training.
To start training, make sure to add WANDB_API
with API key from your wandb account to your environment variables through export or .env
file
Before training ensure your have correctly configured your environment variables with EXPERIMENT_NAME
, MODEL
, FOLD_IDX
to ease experiment tracking.
You can add custom model into source.models
package and add it into source.models.__init__.MODELS
dict to ease usage along with MODEL
env variable.
At the moment the training is as simple as:
python source/train.py
Work in progress
The main idea behind it is to have Jupyter notebook .ipynb
synchronized with this repo and kaggle, shared across team, so everyone could update models and run evaluation on kaggle hidden dataset.
All the experiments should be tracked in some system, the proposed one is Weights&Biases
.
According to this system, all the artifacts might be stored directly on their cloud servers.