Open eddyxu opened 2 years ago
When loading the dataset for training, we'd desire to split the dataset into train, test, eval split. And it should make it easy for an user to just load one of such split, for example
train
test
eval
from rikai.pytorch.data import Dataset train_dataset = Dataset("foo.bar", filters=["split = 'train']) eval_dataset = Dataset("foo.bar", filters=["split = 'eval'"])
We could look into the pyarrow's filters in parquet dataset to see whether we can use them.
When loading the dataset for training, we'd desire to split the dataset into
train
,test
,eval
split. And it should make it easy for an user to just load one of such split, for exampleWe could look into the pyarrow's filters in parquet dataset to see whether we can use them.