As the comment text in config file, the size of each dataset (# [50997(alpaca), 155562(llava), 53456(quora), 101466(sharegpt)] 361481 ) is different from the original dataset.
Hi, we did not filter the dataset. Since we held out some data for validation (~1k for each dataset), so the size of each dataset is smaller than the origin one.
As the comment text in config file, the size of each dataset (# [50997(alpaca), 155562(llava), 53456(quora), 101466(sharegpt)] 361481 ) is different from the original dataset.
Is there any code or script to filter the data?