-
https://aclanthology.org/2022.wmt-1.71/
- [x] sync, pull and merge master first!
- [x] Search for the correct citation on Semantic Scholar
- [x] Make a new branch ("You should always branch out f…
-
```
if "lang" in self.modality_scope:
latent_goal = self.language_goal(dataset_batch["lang"])
```
I found the part of the batch data is "vim" ,and part of them is 'lang" .why is this setting?
…
-
Hi!
this is only a draft and summary of all papers and implementations of mamba.
I will put my feedback here, from Orin AGX 64Gb
Original paper:
(arXiv 2024.01) Vision Mamba: Efficient Visual…
-
# Learning Transferable Visual Models From Natural Language Supervision
2021年2月,OpenAI把GPT上的经验搬到图像上来
Contrastive在于mini-batch中的例子图文(LI)两两形成的对比例子
最后在大量数据上,用数据和一段对称的伪代码(两组softmax)形成的Pre-trained模型。
…
-
https://huggingface.co/OpenGVLab/InternVL-Chat-V1-5
We introduce InternVL 1.5, an open-source multimodal large language model (MLLM) to bridge the capability gap between open-source and proprietary…
-
### Paper
[Learning Transferable Visual Models From Natural Language Supervision](https://arxiv.org/abs/2103.00020) (a.k.a. CLIP)
### Speaker
@joosun7l
-
I think that stage 1 learning, that means visual-language representation learning with those three objectives mentioned in the article is not yet implemented. Am I right?
Not implemented `load_pre…
-
# Summary
기존의 VLP는 from scratch로 학습을 시켰지만, 이는 pre-training cost가 너무 크며 기존에 잘 학습되었던 모델 (특히, LLM)에 대한 활용이 어려움. 따라서, frozen vision encoder와 frozen llm을 Q-Former (Querying Transformer)를 통해 잘 이어보는 방식으…
-
- [ ] Setup Feluda
- [ ] Convert video files to audio
- [ ] Try out Feluda AudioVec (or something you like) and use t-SNE (or other approaches) to evaluate clustering visually
- can do this in a…
-
#
[sound-spaces](https://github.com/facebookresearch/sound-spaces)
[Project: RLR-Audio-Propagation](https://github.com/facebookresearch/rlr-audio-propagation)
[Audio Sensor](https://github.com/f…