Closed pphuangyi closed 3 years ago
Hi Ray,
I made the following changes
tpc_dataset.py
: a dataset API that has split filename as the only input;data_splitter.py
: given path to data files generate the split files;data_loader.py
: a dataset-independent data loader function;
lengths
argument is given either as a single integer or as a sequence of integers, the data loader(s) we get will have numbers of examples bounded by lengths
.lengths = [length_train, length_valid]
, and the function will return two data loaders with length_train
many train examples and length_valid
many validation examples. Things can be improved for this function includes that we may want to use all the examples and split the dataset by ratio instead of lengths.test
function that with TPC dataset API is given, please uncomment it and test.Thank you so much!
Hi Ray, Here are the updates:
dataset_utils.py
: Modified the filenames;tpc_dataloader.py
: Handled seed=None
, added assertions for the existence of manifest files, and corrected a few other bugs;test_tpc_dataloader.py
: Created a test module, test get_tpc_dataloaders
in tpc_dataloader.py
, and added a readme file.Please let me know how I can improve it.
I will be working on the other issues in the meantime.
Thank you!
Hi Ray,
I updated utils/tpc_dataloader.py
, and removed the dataset_utils.py
. I used shuffle
from numpy
and torch.utils.data.Subset
to subsample a dataset, and used torch.utils.data.random_split
to split a dataset as you suggested earlier today.
Here is one thing I need specifically your input on: The first function in utils/tpc_dataloader.py
which does dataset subsampling is generic (has nothing specific to the TPC dataset). Please let me know whether I should put it into another file or it is okay to keep it there.
Please let me know what you think!
Hi everyone,
I updated the new 3d dataset API and also added a
gitignore
file.Please comment to help me improve!
Best, Yi