facebookresearch / DomainBed

DomainBed is a suite to test domain generalization algorithms
MIT License
1.41k stars 299 forks source link

Terra Incognita dataset download #58

Closed alexrame closed 3 years ago

alexrame commented 3 years ago

Thanks again for the nice work. I am currently working on a new approach for domain generalization, your code/framework has been of great help. I just have a problem downloading the Terra Incognita dataset.

python3 download.py --data_dir $DATA_DIR
Downloading...
From: http://www.vision.caltech.edu/~sbeery/datasets/caltechcameratraps18/eccv_18_all_images_sm.tar.gz
To: terra_incognita/terra_incognita_images.tar.gz
Traceback (most recent call last):
  File "/home/rame/anaconda3/envs/bias/lib/python3.7/tarfile.py", line 1646, in gzopen
    t = cls.taropen(name, mode, fileobj, **kwargs)
  File "/home/rame/anaconda3/envs/bias/lib/python3.7/tarfile.py", line 1623, in taropen
    return cls(name, mode, fileobj, **kwargs)
  File "/home/rame/anaconda3/envs/bias/lib/python3.7/tarfile.py", line 1486, in __init__
    self.firstmember = self.next()
  File "/home/rame/anaconda3/envs/bias/lib/python3.7/tarfile.py", line 2289, in next
    tarinfo = self.tarinfo.fromtarfile(self)
  File "/home/rame/anaconda3/envs/bias/lib/python3.7/tarfile.py", line 1094, in fromtarfile
    buf = tarfile.fileobj.read(BLOCKSIZE)
  File "/home/rame/anaconda3/envs/bias/lib/python3.7/gzip.py", line 287, in read
    return self._buffer.read(size)
  File "/home/rame/anaconda3/envs/bias/lib/python3.7/_compression.py", line 68, in readinto
    data = self.read(len(byte_view))
  File "/home/rame/anaconda3/envs/bias/lib/python3.7/gzip.py", line 474, in read
    if not self._read_gzip_header():
  File "/home/rame/anaconda3/envs/bias/lib/python3.7/gzip.py", line 422, in _read_gzip_header
    raise OSError('Not a gzipped file (%r)' % magic)
OSError: Not a gzipped file (b'<!')

The links in "http://www.vision.caltech.edu/~sbeery/datasets/caltechcameratraps18/eccv_18_all_images_sm.tar.gz" and here "http://www.vision.caltech.edu/~sbeery/datasets/caltechcameratraps18/eccv_18_all_annotations.tar.gz" seem inactive. I could not find more information on the original dataset website https://beerys.github.io/CaltechCameraTraps/.

I am not sure you can do anything about it, but just to let you know. Sincerely Alexandre Ramé

alexrame commented 3 years ago

Update: the dataset has been moved to http://lila.science/datasets/caltech-camera-traps. Yet, the "smaller" ECCV18 images (image width resized to 1024 pixels) that were used in DomainBed (eccv_18_all_images_sm.tar.gz) can not be found there.

beerys commented 3 years ago

We had a server change at Caltech which resulted in the data being hosted solely on LILA. You can absolutely build that resized subset from the full dataset if you're in a rush, in the meantime I'll try to surface that zip to make it easier in the future :)

lopezpaz commented 3 years ago

Thank you @beerys, that would be wonderful :)

beerys commented 3 years ago

Ok, the zipped small images and metadata files are uploaded! You can access them here:

small images: https://lilablobssc.blob.core.windows.net/caltechcameratraps/eccv_18_all_images_sm.tar.gz

metadata: https://lilablobssc.blob.core.windows.net/caltechcameratraps/eccv_18_annotations.tar.gz

Let me know if there are any other issues!!

alexrame commented 3 years ago

Thank you very much!

In details, the small images seem to be ok. Regarding the metadata, I believe that the zip files already at "https://lilablobssc.blob.core.windows.net/caltechcameratraps/labels/caltech_camera_traps.json.zip" is in the right template.

Overall, it worked with following code in download.py

def download_terra_incognita(data_dir):
    # Original URL: https://beerys.github.io/CaltechCameraTraps/
    # New URL: http://lila.science/datasets/caltech-camera-traps

    full_path = stage_path(data_dir, "terra_incognita")
    download_and_extract(
        # "http://www.vision.caltech.edu/~sbeery/datasets/caltechcameratraps18/eccv_18_all_images_sm.tar.gz",
        "https://lilablobssc.blob.core.windows.net/caltechcameratraps/eccv_18_all_images_sm.tar.gz",
        os.path.join(full_path, "terra_incognita_images.tar.gz"))

    download_and_extract(
        # "http://www.vision.caltech.edu/~sbeery/datasets/caltechcameratraps18/eccv_18_all_annotations.tar.gz",
        # "https://lilablobssc.blob.core.windows.net/caltechcameratraps/eccv_18_annotations.tar.gz",
        "https://lilablobssc.blob.core.windows.net/caltechcameratraps/labels/caltech_camera_traps.json.zip",
        os.path.join(full_path, "terra_incognita_annotations.tar.gz"))

    include_locations = [
        # 38, 46, 100, 43,
        "38", "46", "100", "43"
        ]
    ...
lopezpaz commented 3 years ago

@alexrame, @beerys mind sending a PR?

addtt commented 3 years ago

Thanks for fixing this! There might be a couple more small things to do before you send the PR:

lopezpaz commented 3 years ago

Fixed by #61. Thank you all :)

GA-17a commented 2 years ago

Hi! @lopezpaz @alexrame @addtt I only got 24330 images using the fixed version. Could you help check the exact number of images you got?

zhyhan commented 1 year ago

Ok, the zipped small images and metadata files are uploaded! You can access them here:

small images: https://lilablobssc.blob.core.windows.net/caltechcameratraps/eccv_18_all_images_sm.tar.gz

metadata: https://lilablobssc.blob.core.windows.net/caltechcameratraps/eccv_18_annotations.tar.gz

Let me know if there are any other issues!!

Thanks for the contributions! However, we cannot open the above websites to download the dataset.