Zasder3 / train-CLIP

A PyTorch Lightning solution to training OpenAI's CLIP from scratch.
MIT License
653 stars 78 forks source link

WebDataset support #18

Open rom1504 opened 3 years ago

rom1504 commented 3 years ago

I think it could be pretty useful to add a webdataset loader to this, so webdataset datasets can be used here. This is relevant as large webdataset are starting to be available (one is crawling at home of size 400M)

I think https://github.com/lucidrains/DALLE-pytorch/pull/280/files may be a good example on how to do it

rom1504 commented 3 years ago

oh I see this repo https://github.com/mlfoundations/open_clip#yfcc-and-other-datasets has support it might be another example

Zasder3 commented 3 years ago

I think this would be a helpful addition to the repo, however, my main short-term focus is a collaboration with the team behind that repo.

If you or anyone else reading is interested in seeing this addition to the repo I'd be glad to accept a PR!