-
# Summary
NLP 성능을 LLM 수준으로 유지시키면서 VLM을 scratch로 학습시키는 건 굉장히 어려움. 따라서, frozen pretrained language model로부터 어떤 식으로 VLM을 학습시키는지를 investigate하는 방향으로 연구가 진행되어 옴.
### 기존 연구 방향
1. Shallow alignmen…
-
# Proposal: Improved Japanese text support in Closed Captions for media
## Summary
We've gotten a request to improve the way we handle Japanese text for closed captions. Today, we render these ho…
-
Hi,
A neat feature would be the ability to have show the transcript of an audio file on the player as well (and have the ability to click on a line and then jump to that part of the audio).
This w…
-
Hi, there is inconsistency in the name of the figure and how it is indicated in the following explanation.
For example, in figure 5 case, it is actively used to describe the content but it is call…
-
Title
-
I cloned the repo last month (before the most recently updated bug pertaining to the evaluation was fixed) but I made the (one line?) fix locally. I then tried training a model from scratch and the fo…
-
Hi! This file is needed for pretraining on Large corpus, but is not found. Could you share this file?
Thanks!
-
Note: Restreaming to YouTube with these captions enabled will corrupt VODs.
Note that visual artifacting when, and only when, I talk into transcribed captions: https://youtu.be/smXf89skebE?t=2991
-
Cogview has these fine tuning abilities :
* Image to text
* Image text score
* Superresolution
I think they are all pretty cool and seem simple enough in the paper
I wonder if we could implemen…
-
How long does it take to train the model