visual-language-learning Search Results

1000+ results
for visual-language-learning

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

cleong110/sign-language-processing.github.io #33

WMT Shared Task 22

https://aclanthology.org/2022.wmt-1.71/ - [x] sync, pull and merge master first! - [x] Search for the correct citation on Semantic Scholar - [x] Make a new branch ("You should always branch out f…

cleong110 updated 1 week ago
10
lukashermann/hulc #19

data process

``` if "lang" in self.modality_scope: latent_goal = self.language_goal(dataset_batch["lang"]) ``` I found the part of the batch data is "vim" ,and part of them is 'lang" .why is this setting? …

Cherryjingyao updated 3 months ago
1
dusty-nv/jetson-containers #447

Mamba (State Spaces Models)

Hi! this is only a draft and summary of all papers and implementations of mamba. I will put my feedback here, from Orin AGX 64Gb Original paper: (arXiv 2024.01) Vision Mamba: Efficient Visual…

johnnynunez updated 3 months ago
4
zchen0420/nn_papers #1

CLIP

# Learning Transferable Visual Models From Natural Language Supervision 2021年2月，OpenAI把GPT上的经验搬到图像上来 Contrastive在于mini-batch中的例子图文（LI）两两形成的对比例子最后在大量数据上，用数据和一段对称的伪代码（两组softmax）形成的Pre-trained模型。 …

zchen0420 updated 2 days ago
3
ollama/ollama #4257

Support for InternVL-Chat-V1.5

https://huggingface.co/OpenGVLab/InternVL-Chat-V1-5 We introduce InternVL 1.5, an open-source multimodal large language model (MLLM) to bridge the capability gap between open-source and proprietary…

wwjCMP updated 2 weeks ago
2
Lab-LVM/awesome-VLM #1

[20230406] Weekly VLM1 - CLIP

### Paper [Learning Transferable Visual Models From Natural Language Supervision](https://arxiv.org/abs/2103.00020) (a.k.a. CLIP) ### Speaker @joosun7l

SoongE updated 1 year ago
5
salesforce/LAVIS #237

[BLIP2] How to perform stage 1 Vision-Language Representati…

I think that stage 1 learning, that means visual-language representation learning with those three objectives mentioned in the article is not yet implemented. Am I right? Not implemented `load_pre…

klima7 updated 1 year ago
1
YoojLee/paper_review #71

BLIP-2: Bootstrapping Language-Image Pre-training with Froze…

# Summary 기존의 VLP는 from scratch로 학습을 시켰지만, 이는 pre-training cost가 너무 크며 기존에 잘 학습되었던 모델 (특히, LLM)에 대한 활용이 어려움. 따라서, frozen vision encoder와 frozen llm을 Q-Former (Querying Transformer)를 통해 잘 이어보는 방식으…

YoojLee updated 4 months ago
1
tattle-made/feluda #361

Use AudioVec Operator for Clustering Audio

- [ ] Setup Feluda - [ ] Convert video files to audio - [ ] Try out Feluda AudioVec (or something you like) and use t-SNE (or other approaches) to evaluate clustering visually - can do this in a…

aatmanvaidya updated 1 hour ago
12
yyf17/awesome-embodied-intelligent #1

SoundSpace

# [sound-spaces](https://github.com/facebookresearch/sound-spaces) [Project: RLR-Audio-Propagation](https://github.com/facebookresearch/rlr-audio-propagation) [Audio Sensor](https://github.com/f…

yyf17 updated 1 year ago
1

上一页 1...1 2 3 4 5 6 7...100 下一页

1000+ results for visual-language-learning

1000+ results
for visual-language-learning