R1chrdson / vesuvius_challenge

The Vesuvius Challenge - Ink Detection kaggle competition from the "UCU dropouts" team
0 stars 0 forks source link

The Vesuvius Challenge

This repository contains codebase for the competition as well as setup for experiments version control for "UCU dropouts" team

Setup

Local setup

Prerequisites

Environment setup

First of all, you will need to initialize your own .env file, create python virtual environment and install necessary packages:

python3 -m venv venv
source venv/bin/activate
pip install -e . --no-deps
pip install -r requirements.txt

To train locally, on your own host machine, it's required to download kaggle competition dataset first. It's suggested to store it in the dataset folder. To use kaggle cli, you have to setup your credentials according to this instruction. To download dataset use the following command:

kaggle competitions download -c vesuvius-challenge-ink-detection --force --path dataset
unzip dataset/vesuvius-challenge-ink-detection.zip -d dataset

Note: Takes 15 mins to download with speed 25 MB/s (UCU WIFI) and 27 mins to extract archives on HDD.

It's important to manage your env variables. The easiest way to set up all variables needed, is to create .env file:

cp .env-example .env

Note: The WANDB_API entry with API key from your wandb account is not present in .env-exmaple but is important for experiment tracking!

Colab setup

Work in progress The idea is to have a notebook with possibility to specify params and clone this repo and execute trainings with possibly some local changes. The credentials Good hint is to download the dataset to Google Drive to decrease time needed to initialize the dataset. The credentials might be stored as Google Drive file.

Kaggle setup

The kaggle notebook is available by the link and in (notebooks/kaggle_training_notebook.ipynb). Simply follow the instructions from the notebook to start training.

Training

To start training, make sure to add WANDB_API with API key from your wandb account to your environment variables through export or .env file

Before training ensure your have correctly configured your environment variables with EXPERIMENT_NAME, MODEL, FOLD_IDX to ease experiment tracking.

You can add custom model into source.models package and add it into source.models.__init__.MODELS dict to ease usage along with MODEL env variable.

At the moment the training is as simple as:

python source/train.py

Submission

Work in progress

The main idea behind it is to have Jupyter notebook .ipynb synchronized with this repo and kaggle, shared across team, so everyone could update models and run evaluation on kaggle hidden dataset.

Experiments control

All the experiments should be tracked in some system, the proposed one is Weights&Biases. According to this system, all the artifacts might be stored directly on their cloud servers.