The overall pipeline implementations of caption generation for VAST-27M

TXH-mercury / VAST

Code and Model for VAST: A Vision-Audio-Subtitle-Text Omni-Modality Foundation Model and Dataset

https://arxiv.org/abs/2305.18500

MIT License

243 stars 17 forks source link

The overall pipeline implementations of caption generation for VAST-27M #19

Open XuecWu opened 7 months ago

XuecWu commented 7 months ago

Thank you for your great contributions!

As described above, I notice that only trained video and audio captioners are provided in this repo. Would the authors open the implementation process for the LLM part and the overall scripts for the caption generation?

Any reply will be sincerely appreciated. Best regards,