This repository contains the solution for the Sketch Image Classification competition, where the task is to develop a model that can classify objects depicted in sketch images. The primary goal of this project is to build a robust classification model that can recognize abstract shapes and forms, which are often simplified and lack color and texture.
Below is the tree structure of the repository. The code is organized to easily run training, inference, and model evaluation.
.
├── data/ # Dataset directory
│ ├── train/ # Training images
│ ├── test/ # Test images
│ ├── train.csv # Training labels
│ └── test.csv # Test file names
├── notebooks/ # jupyter notebook files for EDA and training
├── src/ # Source code directory
│ ├── dataset.py # Custom dataset implementation
│ ├── transforms.py # Data augmentation and transformation logic
│ ├── model.py # Model selection and initialization
│ ├── loss.py # Custom loss functions (e.g., Focal Loss)
│ ├── trainer.py # Training and evaluation logic
│ └── utils.py # Utility functions
├── main.py # Training script
├── inference.py # Inference script
├── requirements.txt # Python dependencies
└── README.md # This readme file
The dataset used in this competition is derived from the ImageNet Sketch dataset, which originally contains 50,889 hand-drawn images across 1,000 classes. For this competition, the dataset has been curated and refined to focus on 500 classes, resulting in 25,035 images. The dataset is split into:
Each image represents a simplified, hand-drawn version of an object from one of the 500 categories.
The goal is to develop a model that can learn to recognize and classify objects based on their simplified sketches. The project focuses on capturing the core structure and form of objects without relying on color or texture.
Key steps include:
The performance of the model is evaluated using Accuracy, which measures the proportion of correctly predicted images compared to the total number of predictions. The formula for accuracy is:
Accuracy = (Number of Correct Predictions / Total Predictions) * 100
Our goal is to maximize this metric by fine-tuning the model and improving its ability to generalize to unseen sketch images.
To run this project, you'll need the following dependencies:
You can install the necessary libraries by running:
pip install -r requirements.txt
Clone the repository:
git clone https://github.com/boostcampaitech7/level1-imageclassification-cv-19.git
cd sketch-image-classification
Place the training and evaluation images in the appropriate directories (data/train, data/test). Ensure that the training labels are stored in train.csv and test file names in test.csv.
You can train the model using the provided script:
python train.py --epochs 50 --batch-size 32 --lr 1e-4
After training, generate predictions for the test set by running:
python inference.py --model-path saved_model.pth --output output.csv
The output.csv file will contain the class probabilities for each image in the evaluation set, ready for submission.
After training the model on the provided dataset, we achieved an accuracy of 0.9360% on the evaluation dataset. Our final submission ranked 3rd place on the competition leaderboard.