descriptinc / audiotools

Object-oriented handling of audio data, with GPU-powered augmentations, and more.
https://descriptinc.github.io/audiotools/
MIT License
233 stars 39 forks source link

ConcatDataset Behavior and Equal Dataset Length Requirement: #95

Open lianabagh opened 1 year ago

lianabagh commented 1 year ago

The current implementation of the ConcatDataset class in the provided codebase enforces a requirement that all datasets within the ConcatDataset must have the same length for the getitem function to function correctly. This restriction is reflected in the calculation of indices for each dataset during item retrieval, which can lead to errors if the datasets have varying lengths. The requirement for equal lengths might limit the flexibility of the ConcatDataset class when dealing with datasets of different lengths.

class ConcatDataset(AudioDataset): def init(self, datasets: list): self.datasets = datasets

def __len__(self):
    return sum([len(d) for d in self.datasets])

def __getitem__(self, idx):
    dataset = self.datasets[idx % len(self.datasets)]
    return dataset[idx // len(self.datasets)]

Default Length in AudioDataset: Additionally, within the Audiodataset class, there is a variable named n_examples that sets the default length of the dataset to 1000. It's important to note that this value might not be aligned with the actual length of the dataset instances.