using the dataset - Githubissues

DenisSouth commented 5 years ago

I downloaded this dataset https://www.kaggle.com/sophatvathana/casia-dataset but it has no any description, and some strange folder tree

├───CASIA1 │ ├───Au │ ├───ela │ └───Sp ├───CASIA2 │ ├───Au │ └───Tp └───__MACOSX ==├───CASIA1 ==│ ├───Au ==│ └───Sp ==└───CASIA2 ====├───Au ====└───Tp

which one should i use for train, which for test? which one is original pic which one is modified? also i know the csv format

file_name,1 or 0 (fake or real image) example for real image: 'datasets/train/real/Au_ani_00001.jpg',0

but i have no idea which folder should i use for source...

I appreciate for your great work, and I want repeat it by myself :- )

========================================= so. i made this

I upload zip to google drive unzip it to '/content/gdrive/My Drive/casia_dataset/ in google colab i generated csv by following code

is it right?

import os
path_orig = '/content/gdrive/My Drive/casia_dataset/CASIA2/Au/'
path_modif = '/content/gdrive/My Drive/casia_dataset/CASIA2/Tp/'

folder_orig = os.listdir()
folder_modif = os.listdir()

strings = []

for file in os.listdir(path_orig):
  try:
    if file.endswith('jpg'):
      if int(os.stat(path_orig + file).st_size) > 10000:
        line =  path_orig + file  + ',1\n'
        strings.append(line)
  except:
    print(path_orig+file)

for file in os.listdir(path_modif):
    try:
      if file.endswith('jpg'):
         if int(os.stat(path_modif + file).st_size) > 10000:
            line =  path_modif + file + ',0\n'
            strings.append(line)
    except:
      print(path_modif+file)

for line in strings:
      with open('/content/gdrive/My Drive/casia_dataset/dataset.csv', 'a') as f:
         f.write(line)

agusgun commented 5 years ago

Yup, I think that is correct.

For the datasets, I think Au stands for Authentic meanwhile Tp stands for Tampered. Hope this will help.

If you already solved this issue, please close it :). Thank you very much.

agusgun commented 5 years ago

Sure :). I already add the LICENSE too

On Fri, Mar 8, 2019 at 3:35 PM DenisSouth notifications@github.com wrote:

Yup, I think that is correct.

For the datasets, I think Au stands for Authentic meanwhile Tp stands for Tampered. Hope this will help.

If you already solved this issue, please close it :). Thank you very much.

thanks. may i fork it. change and add MIT license?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/agusgun/FakeImageDetector/issues/2#issuecomment-470848924, or mute the thread https://github.com/notifications/unsubscribe-auth/AOKcgTrNs30YqOBZkdaXZnCB2dFJHGuBks5vUiDfgaJpZM4bkv22 .

--

Agus Gunawan 13515143 Sekolah Teknik Elektro dan Informatika Institut Teknologi Bandung

pidugusundeep commented 4 years ago

What are the images with Sp??

DenisSouth commented 4 years ago

What are the images with Sp??

Au is Authentic pics Tp is Tampered pics

make CSV for train

import os
path_orig = 'casia/CASIA2/Au/' #Authentic 
path_modif = 'casia/CASIA2/Tp/' #Tampered

folder_orig = os.listdir()
folder_modif = os.listdir()

strings = []

for file in os.listdir(path_orig):
    if file.endswith('jpg'):
      if int(os.stat(path_orig + file).st_size) > 10000:
        line =  path_orig + file  + ',1\n'
        strings.append(line)

for file in os.listdir(path_modif):
      if file.endswith('jpg'):
         if int(os.stat(path_modif + file).st_size) > 10000:
            line =  path_modif + file + ',0\n'
            strings.append(line)

for line in strings:
      with open('casia/dataset.csv', 'a') as f:
         f.write(line)

pidugusundeep commented 4 years ago

@DenisSouth What are the images with Sp ?? what kind of images are they?

DenisSouth commented 4 years ago

@DenisSouth What are the images with Sp ?? what kind of images are they?

https://www.kaggle.com/sophatvathana/casia-dataset

it is modified jpg image

agusgun / FakeImageDetector

using the dataset #2

--