anggara-kaskus / image-classification

1 stars 1 forks source link

Kaskus Image Classification

Image classification using Keras and Convolutional Neural Network (CNN). Training script and model architecture is based from nsfw_model with some modification for setting default folder values.

Current Status

The pre-compiled model in models/ folder is trained with following classification:

However, you can train any image classification with your own defined classes using this code. Just follow the instruction.

Model architecture

Model architecture

Dataset Sources

This dataset is collected from various sources with total of ~10GB of image files:

Unfortunately, the accuracy of 'sensitive' class is quite low and may be caused by following reasons:

Setup

Prerequisites

Installation

Training

Prepare sample data

For example if you want to classify nonsensitive (subclasses: flowers and animals) and sensitive (subclasses: porn and gore) images, please make your directory structure as below:

data/
  └─ raw/
     ├─ nonsensitive/
     │  ├─ flowers/
     │  |  ├─ flower_1.jpg
     │  |  ├─ flower_2.jpg
     │  |  ├─ flower_3.jpg
     │  |  └─ ...
     │  | 
     │  └─ animals/
     │     ├─ cat_1.jpg
     │     ├─ dog_2.jpg
     │     ├─ bird_3.jpg
     │     └─ ...
     │
     └─ sensitive/
        ├─ porn/
        |  ├─ porn_1.jpg
        |  ├─ porn_2.jpg
        |  ├─ porn_3.jpg
        |  └─ ...
        | 
        └─ gore/
           ├─ gore_1.jpg
           ├─ gore_2.jpg
           ├─ gore_3.jpg
           └─ ...

Note: If you want to get sample data for testing, you can download one from Tensorflow Dataset Collection

Preprocess image data

Run following command to flatten all subdirectory contents of each classes into one folder:

python training/preprocess.py

Image will be formatted as RGB JPEG. This command will also check if there is corrupted or invalid image, it will be moved into invalid/ folder. Optionally, you can also set crop=True in file preprocess.py to crop images to square during process.

The result of this process will looks like this:

data/
  ├─ invalid/
  │  ├─ nonsensitive/
  │  │  ├─ corrupted_1.jpg
  │  │  └─ ...
  │  │
  │  └─ sensitive/
  │     ├─ invalid_1.jpg
  │     └─ ...
  │
  └─ processed/
     ├─ nonsensitive/
     │  ├─ flower_1.jpg
     │  ├─ flower_2.jpg
     │  ├─ flower_3.jpg
     │  ├─ cat_1.jpg
     │  ├─ dog_2.jpg
     │  ├─ bird_3.jpg
     │  └─ ...
     │
     └─ sensitive/
        ├─ porn_1.jpg
        ├─ porn_2.jpg
        ├─ porn_3.jpg
        ├─ gore_1.jpg
        ├─ gore_2.jpg
        ├─ gore_3.jpg
        └─ ...

Split training and test data

Split files for training and test data*:

python training/split.py

The default ratio for splitting is 70:30 for training and test data. To change it, please set test_size=0.3 to desired value in split.py.

*) Not to be confused with validation data. Validation set is generated internally during training session, while test data is used for assessment of your model

Train the model

Run the training. For more detailed parameters, please consult nsfw_model documentation.

python training/train.py

Generated model files will be saved to folder models/

Training Report

To generate graphs and prediction report, run:

python training/report.py

This will generate models/confusion_matrix.png and models/classification_report.txt

Running classification on image

Run one time prediction

Classify single image:

python prediction/predict.py --image_source /path/to/image.jpg

or multiple images in a directory:

python prediction/predict.py --image_source /path/to/directory/image/

Output sample:

{
  "/path/to/directory/image/14jp9.jpg": {
    "safe": 0.3952472507953644,
    "sensitive": 0.604752779006958
  },
  "/path/to/directory/image/sensitive_3578.jpg": {
    "safe": 0.7168499827384949,
    "sensitive": 0.28314998745918274
  }
} 

Run as a service

To start TCP service, run:

python prediction/server.py

The server will listen to port 1235

You can connect as telnet client and input full path of target image (in server's storage)

$ telnet localhost 1235
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.

Welcome! Enter image path to scan
# Path: /path/to/image/directory/
> Scanning: /path/to/image/directory/
> Result for : /path/to/image/directory/

{
  "/path/to/image/directory/14jp9.jpg": {
    "safe": 0.3952472507953644,
    "sensitive": 0.604752779006958
  },
  "/path/to/image/directory/sensitive_3578.jpg": {
    "safe": 0.7168499827384949,
    "sensitive": 0.28314998745918274
  },
  "__time__": 0.631289
}

# Path: /path/to/image/directory/14jp9.jpg
> Scanning: /path/to/image/directory/14jp9.jpg
> Result for : /path/to/image/directory/14jp9.jpg

{
  "/path/to/image/directory/14jp9.jpg": {
    "safe": 0.3952471911907196,
    "sensitive": 0.604752779006958
  },
  "__time__": 0.089046
}

Debugging

If you want to debug or just playing around, you can run following command (requires Jupyter):

cd notebooks

# run Jupyter Lab or Jupyter Notebook:
jupyter-lab &> logs/log.txt & 
# or
jupyter notebook &> logs/log.txt & 

# and then open file tester.ipynb