Closed Dv04 closed 1 year ago
@Dv04 can you assign me this issue :)
Hey @darshbaxi, glad to have you with us. Assigning you now.
is data to be passed as a parameter? what exactly is the job of folders variable ??
is data to be passed as a parameter? what exactly is the job of folders variable ??
Data has to be passed in the folders variable as follows:
E.g.
folders = glob.glob("data/*")
which will give the folders this array:
['data/Image_Folder1', 'data/Image_Folder2', 'Image_Folder3', 'data/Image_Folder4', 'data/Image_Folder5', 'data/Image_Folder6']
This is only in case you have more than one folders in dataset. other wise you can directly give the folders variable the path to the main Image directory.
Hope there are no more issues. Ask away if there exists.
@Dv04 I have created a Pull Request. Please have a look at it
@Dv04 I have created a Pull Request. Please have a look at it
@darshbaxi, will give you the reply before tomorrow, just have to finish up some bits of work.
The project requires a robust mechanism for splitting the dataset into training, validation, and testing sets. The following points should be addressed:
Balanced Distribution: Ensure a balanced distribution of data among these sets to prevent any bias during model training and evaluation. Stratification: If applicable, stratify the split to maintain the distribution of certain variables. Randomization: Include randomization to ensure different data points are used across multiple runs, enhancing the model's robustness. Configurability: Allow for easy configurability of the split ratios and seed for reproducibility. This mechanism should be encapsulated in a function, making the dataset preparation phase clean and reproducible.