TRI-ML / prismatic-vlms

A flexible and efficient codebase for training visually-conditioned language models (VLMs)
MIT License
327 stars 93 forks source link

Multi-dataset support issue. #17

Closed tayton42 closed 2 months ago

tayton42 commented 2 months ago

Hello, if I want to use multiple dataset classes during the alignment stage or fine-tuning stage, how should I configure it?

siddk commented 2 months ago

I think the easiest thing to do would be to hack this function.

Right now, it'll instantiate just a single AlignDataset or FinetuneDataset class, but as both inherit from standard PyTorch Dataset, you can just initialize multiple different dataset instances in a for loop. Then, you can use ConcatDataset for example to stitch them together!