Thanks for curating the rare species dataset! I'm trying to load this dataset with PyTorch dataloader, and got the following error.
SyntaxError: not a TIFF file (header b'IIU\x00\x18\x00\x00\x00' not valid)
I downloaded the dataset from huggingface with the following code
from datasets import load_dataset
ds = load_dataset("imageomics/rare-species")
and used PyTorch Dataset and Dataloader to load it:
class CustomDataset(torch.utils.data.Dataset):
def __init__(self, dataset, transform=None):
self.dataset = dataset
self.transform = transform
def __len__(self):
return len(self.dataset)
def __getitem__(self, idx):
return self.dataset[idx]["rarespecies_id"]
custom_dataset = CustomDataset(dataset["train"], transform=transform)
dataloader = DataLoader(custom_dataset, batch_size=32, shuffle=False)
for i, data in enumerate(dataloader):
print(data)
CustomDataset returns rarespecies_id only for simplicity for debugging. I was able to get the ID's printed for the first 66 batches and got the SyntaxError
SyntaxError: not a TIFF file (header b'IIU\x00\x18\x00\x00\x00' not valid)
when loading the 67th batch.
I also checked the Dataset Viewer on huggingface and got the following error on page 22 where the corrupted file should loacate:
Thanks for letting us know! I've opened an issue (discussion 8) on the Hugging Face repo to address this. Please feel free to comment and follow the discussion there.
Hi,
Thanks for curating the rare species dataset! I'm trying to load this dataset with PyTorch dataloader, and got the following error.
I downloaded the dataset from huggingface with the following code
and used PyTorch Dataset and Dataloader to load it:
CustomDataset returns
rarespecies_id
only for simplicity for debugging. I was able to get the ID's printed for the first 66 batches and got the SyntaxErrorwhen loading the 67th batch.
I also checked the Dataset Viewer on huggingface and got the following error on page 22 where the corrupted file should loacate:
Meanwhile, I can view page 21 and 23: