Image classification using Keras and Convolutional Neural Network (CNN). Training script and model architecture is based from nsfw_model with some modification for setting default folder values.
The pre-compiled model in models/
folder is trained with following classification:
However, you can train any image classification with your own defined classes using this code. Just follow the instruction.
This dataset is collected from various sources with total of ~10GB of image files:
Unfortunately, the accuracy of 'sensitive' class is quite low and may be caused by following reasons:
git clone git@github.com:anggara-kaskus/image-classification.git
cd image-classification
python -m pipenv update
python -m pip install -r requirements.txt
data/raw/
according to specified classesFor example if you want to classify nonsensitive (subclasses: flowers and animals) and sensitive (subclasses: porn and gore) images, please make your directory structure as below:
data/
└─ raw/
├─ nonsensitive/
│ ├─ flowers/
│ | ├─ flower_1.jpg
│ | ├─ flower_2.jpg
│ | ├─ flower_3.jpg
│ | └─ ...
│ |
│ └─ animals/
│ ├─ cat_1.jpg
│ ├─ dog_2.jpg
│ ├─ bird_3.jpg
│ └─ ...
│
└─ sensitive/
├─ porn/
| ├─ porn_1.jpg
| ├─ porn_2.jpg
| ├─ porn_3.jpg
| └─ ...
|
└─ gore/
├─ gore_1.jpg
├─ gore_2.jpg
├─ gore_3.jpg
└─ ...
Note: If you want to get sample data for testing, you can download one from Tensorflow Dataset Collection
Run following command to flatten all subdirectory contents of each classes into one folder:
python training/preprocess.py
Image will be formatted as RGB JPEG. This command will also check if there is corrupted or invalid image, it will be moved into invalid/
folder.
Optionally, you can also set crop=True
in file preprocess.py
to crop images to square during process.
The result of this process will looks like this:
data/
├─ invalid/
│ ├─ nonsensitive/
│ │ ├─ corrupted_1.jpg
│ │ └─ ...
│ │
│ └─ sensitive/
│ ├─ invalid_1.jpg
│ └─ ...
│
└─ processed/
├─ nonsensitive/
│ ├─ flower_1.jpg
│ ├─ flower_2.jpg
│ ├─ flower_3.jpg
│ ├─ cat_1.jpg
│ ├─ dog_2.jpg
│ ├─ bird_3.jpg
│ └─ ...
│
└─ sensitive/
├─ porn_1.jpg
├─ porn_2.jpg
├─ porn_3.jpg
├─ gore_1.jpg
├─ gore_2.jpg
├─ gore_3.jpg
└─ ...
Split files for training and test data*:
python training/split.py
The default ratio for splitting is 70:30 for training and test data.
To change it, please set test_size=0.3
to desired value in split.py
.
*) Not to be confused with validation data. Validation set is generated internally during training session, while test data is used for assessment of your model
Run the training. For more detailed parameters, please consult nsfw_model documentation.
python training/train.py
Generated model files will be saved to folder models/
To generate graphs and prediction report, run:
python training/report.py
This will generate models/confusion_matrix.png
and models/classification_report.txt
Classify single image:
python prediction/predict.py --image_source /path/to/image.jpg
or multiple images in a directory:
python prediction/predict.py --image_source /path/to/directory/image/
Output sample:
{
"/path/to/directory/image/14jp9.jpg": {
"safe": 0.3952472507953644,
"sensitive": 0.604752779006958
},
"/path/to/directory/image/sensitive_3578.jpg": {
"safe": 0.7168499827384949,
"sensitive": 0.28314998745918274
}
}
To start TCP service, run:
python prediction/server.py
The server will listen to port 1235
You can connect as telnet client and input full path of target image (in server's storage)
$ telnet localhost 1235
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
Welcome! Enter image path to scan
# Path: /path/to/image/directory/
> Scanning: /path/to/image/directory/
> Result for : /path/to/image/directory/
{
"/path/to/image/directory/14jp9.jpg": {
"safe": 0.3952472507953644,
"sensitive": 0.604752779006958
},
"/path/to/image/directory/sensitive_3578.jpg": {
"safe": 0.7168499827384949,
"sensitive": 0.28314998745918274
},
"__time__": 0.631289
}
# Path: /path/to/image/directory/14jp9.jpg
> Scanning: /path/to/image/directory/14jp9.jpg
> Result for : /path/to/image/directory/14jp9.jpg
{
"/path/to/image/directory/14jp9.jpg": {
"safe": 0.3952471911907196,
"sensitive": 0.604752779006958
},
"__time__": 0.089046
}
If you want to debug or just playing around, you can run following command (requires Jupyter):
cd notebooks
# run Jupyter Lab or Jupyter Notebook:
jupyter-lab &> logs/log.txt &
# or
jupyter notebook &> logs/log.txt &
# and then open file tester.ipynb