IStego100K: Large-scale Image Steganalysis Dataset, mixed with various steganographic algorithms, embedding rates, and quality factors.
In order to promote the rapid development of image steganalysis technology, in this work, we construct and release a multivariable large-scale image steganalysis dataset called IStego100K. It contains 208,104 images with the same size of 1024*1024. Among them, 200,000 images (100,000 cover-stego image pairs) are divided as the training set and the remaining 8,104 as testing set. In addition, we hope that IStego100K can help researchers further explore the development of universal image steganalysis algorithms, so we try to reduce limits on the images in IStego100K. For each image in IStego100K, the quality factors is randomly set in the range of 75-95, the steganographic algorithm is randomly selected from three well-known steganographic algorithms, which are J-uniward, nsF5 and UERD, and the embedding rate is also randomly set to be a value of 0.1-0.4. In addition, considering the possible mismatch between training samples and test samples in real environment, we add a test set (DS-Test) whose source of samples are different from the training set. We hope that this test set can help to evaluate the robustness of steganalysis algorithms. We tested the performance of some latest steganalysis algorithms on IStego100K, with specific results and analysis details in the experimental part. We hope that the IStego100K dataset will further promote the development of universal image steganalysis technology
If you used this dataset in your work, please consider to cite it in the following format:
@inproceedings{yangzl2019IStego100K,
title = {IStego100K: Large-scale Image Steganalysis Dataset},
author = {Yang, Zhongliang and Wang, Ke and Ma, Sai and Huang, Yongfeng and Kang, Xiangui and Zhao, Xianfeng},
booktitle = {International Workshop on Digital Watermarking},
year = {2019},
organization = {Springer}
}
Full PDF can be downloaded from arxiv
100,000 pairs of cover and stego images (200K in total), origin images were downloaded from Unsplash
Marked as SS-Test in the paper. 8104 images with cover/stego labels (not in pair), origin images were downloaded from Unsplash
Marked as DS-Test in the paper.10000 images with cover/stego labels (not in pair), origin images were shot on different mobile devices.
Note: The number of images is 11809 in the paper, but we removed some low quality images before uploading.
For those who cannot access Google in Mainland China, try this Baidu Cloud Disk link:
We also provide detailed parameters for each image here.
The parameter files are organized as follows:
parameters={
"000001.jpg":{ # parameters for stego-file
"quality": 95, # quality factor
"rate": 0.4, # embedding rate (payload)
"steg_algorithm": "nsf5" # steganographic algorithm
},
"000002.jpg":{ # parameters for cover-file
"quality": 90 # quality factor
}
}
Note: For the training set, cover files and stego files are in pairs with same quality factors, so we omitted the parameter file for cover files in training set.
We use the following steganographic algorithms for our dataset:
For more details, including codes and tutorial, please refer to our Steganography page.
We apply the following steganalysis algorithms for dataset evaluation:
For more details, including codes and tutorial, please refer to our Steganalysis page.
Overall Results
Dataset | Methods | Acc(%) | P(%) | R(%) | F1(%) |
---|---|---|---|---|---|
SS-Test | DCTR GFR SRNet XuNet |
71.34 66.26 - - |
79.72 69.58 - - |
57.23 57.97 - - |
66.63 63.25 - - |
DS-Test | DCTR GFR SRNet XuNet |
56.95 59.12 - - |
55.50 61.61 - - |
70.11 48.42 - - |
61.95 54.22 - - |
Note: We trained SRNet and XuNet on a single GPU (GTX 1080Ti), and found that they are hardly to converge on IStego100K.
Results for Different Steganography Algorithms
Test Set | Steganalysis | Steganography | Acc(%) | P(%) | R(%) | F1(%) |
---|---|---|---|---|---|---|
SS-Test | DCTR | UERD nsF5 J-uniward |
71.77 84.44 57.73 |
79.75 85.10 67.58 |
58.36 83.51 29.71 |
67.40 84.30 41.27 |
SS-Test | GFR | UERD nsF5 J-uniward |
68.47 71.61 58.81 |
71.34 72.72 62.91 |
61.75 69.18 42.92 |
66.20 70.91 51.02 |
DS-Test | DCTR | UERD nsF5 J-uniward |
53.96 62.28 51.67 |
53.35 60.56 51.43 |
63.06 87.59 59.83 |
57.80 71.61 55.31 |
DS-Test | GFR | UERD nsF5 J-uniward |
56.05 67.24 54.59 |
58.40 68.21 56.62 |
42.09 64.58 39.26 |
48.92 66.35 46.37 |
Results for Different Steganography Algorithms
Test Set | Steganalysis | Payload | Acc(%) | P(%) | R(%) | F1(%) |
---|---|---|---|---|---|---|
SS-Test | DCTR | 0.1 0.2 0.3 0.4 |
58.55 71.43 76.30 79.55 |
67.84 80.19 82.22 83.74 |
32.51 56.90 67.11 73.35 |
43.96 66.57 73.90 78.20 |
SS-Test | GFR | 0.1 0.2 0.3 0.4 |
55.87 63.51 70.83 75.71 |
59.40 67.98 72.04 74.89 |
37.10 51.08 67.89 76.75 |
45.67 58.33 69.95 72.05 |
DS-Test | DCTR | 0.1 0.2 0.3 0.4 |
52.86 56.21 58.56 60.17 |
52.42 54.99 56.53 57.72 |
61.90 68.40 74.11 76.05 |
56.77 60.97 64.13 65.63 |
DS-Test | GFR | 0.1 0.2 0.3 0.4 |
52.29 56.66 62.15 65.40 |
53.42 58.87 64.65 67.18 |
35.79 44.19 53.65 60.22 |
42.86 50.49 58.63 63.51 |
Results for Different Quality Factors on SS-Test
Steganalysis | QF | Acc(%) | P(%) | R(%) | F1(%) |
---|---|---|---|---|---|
DCTR | 75 80 85 90 95 |
75.23 71.50 74.09 69.04 62.12 |
85.63 86.48 84.34 76.09 66.41 |
60.64 61.56 59.18 55.54 49.05 |
71.00 71.82 69.55 64.21 56.43 |
GFR | 75 80 85 90 95 |
70.08 69.91 68.42 64.67 58.30 |
75.06 74.98 71.54 67.02 59.76 |
60.15 59.75 61.17 57.75 50.82 |
66.78 66.50 65.95 62.04 64.93 |
For more details such as pre-processing, data distribution, and steganalysis baselines, please take a look at the arxiv.