ZhangTP1996 / TapTap

58 stars 4 forks source link

How is the custom dataset loaded for fine-tuning #3

Closed dionman closed 1 year ago

dionman commented 1 year ago

Thanks for providing the script for reading the data into a dictionary. Could you please provide some extension showing how this dictionary is actually loaded as a torch dataset that be used for fine-tuning the base LLM? Have you defined a custom dataset class for this? Do you define a single dataloader iterating over the full concatenation of datasets, or separate dataloaders per dataset ? How is data serialisation implemented?