PatrykChrabaszcz / Imagenet32_Scripts

Scripts for Imagenet 32 dataset
MIT License
150 stars 47 forks source link

Trouble Using Data #1

Closed pGit1 closed 6 years ago

pGit1 commented 6 years ago

So when I download the data from the website, I get a tarfile back. I've tried your unpicle code and that won't work when I provide the tar file path. How exactly should I process this data in order to get to the images and labels? I am not following the scripts as I don't see any code where you process or unzip tarfiles.

PatrykChrabaszcz commented 6 years ago

When you unpack downloaded files then you should see files that can be loaded. For the code that reads the files please look at: https://patrykchrabaszcz.github.io/Imagenet32/

pGit1 commented 6 years ago

Thats where I am hiving the issue. How exactly to I "Unpack the files" in Windows?

PatrykChrabaszcz commented 6 years ago

You probably need to install some software to unpack the files. Files I uploaded are in .zip format. When you unzip them then they have the same format as CIFAR files. To see some code that is able to visualize what is inside those files look at: https://github.com/PatrykChrabaszcz/Imagenet32_Scripts/blob/master/test.py

pGit1 commented 6 years ago

Ok thank you!!

PatrykChrabaszcz commented 6 years ago

Does it work? Also using Python 2 might be problematic. As on the website: "You need Python3 to unpickle files. Note that its default encoding differs from Python2."

pGit1 commented 6 years ago

Hi I am using Python 3. Right now I am using 7zip to unzip the files to a directory. I can confirm that images are indeed being unzipped to the directory. What I cannot see is where the labels are in the file. It looks like all that gets extracted are the .png files. Although I have a long way to go before the file is done extracting. Once the images are loaded into a directory I will be OK from there.

I will be able to visualize on my own and start experimenting on my own. My concern was just getting the images extracted in the first place. But now it appears I will need the labels as well. From what I can tell in the unzipping process it doesn't appear that my download contains the label file associated with each .png file name.

I downloaded from here... http://image-net.org/small/download.php

PatrykChrabaszcz commented 6 years ago

Please take a look at the website: https://patrykchrabaszcz.github.io/Imagenet32/ There is a code to extract images and to extract labels

If it is still problematic then try to google for "CIFAR" dataset, it is quite popular. If you write a code that is able to read CIFAR files then the same code can be used to read "Imagenet 32x32".

pGit1 commented 6 years ago

I have. The below is not clear. What exact does file represent?

def unpickle(file):
    with open(file, 'rb') as fo:
        dict = pickle.load(fo)
    return dict

I see this as well:

def load_databatch(data_folder, idx, img_size=32):
    data_file = os.path.join(data_folder, 'train_data_batch_')

    d = unpickle(data_file + str(idx))
    x = d['data']
    y = d['labels']

but I still dont see how I can arrive at a dictionary after unzipping the tar files which is taking a very long time. I can confirm images are being extracted but there are no files that suggests there is a dictionary to extract. Perhaps the download link I used is missing this data. The link you provided to the data on the blog page is behind a firewall. I would have to wait for approval.

PatrykChrabaszcz commented 6 years ago

Probably you downloaded wrong dataset. We created Imagenet 32x32 because "small Imagenet" was published with a paper that used it for unsupervised training, and therefore did not need the labels. We created a new dataset, showed that different downsampling methods produce similar results and uploaded it together with the labels. Please use the link that is provided on the website, and yes you need to register to get an access there.

Sorry I didn't check your link 3 messages ago, both links look similar and I assumed it was the right one.

pGit1 commented 6 years ago

Darn. It looks like it takes too many days then I have for access. I need immediate access to labels or will fall behind :(

PatrykChrabaszcz commented 6 years ago

Strange, I remember that I got the access quite fast if not immediately. Maybe you should use university account if you are a student.