-
Hi, I was wondering why do I see the following log by using `stable_diffusion_2`? I didn't see the training code is supposed to load `openai--clip-vit-large-patch14`, isn't it?
```
mosaic/0 [0]:[INF…
-
Hey,
What are the pre-trained weights to use for stage 1 multi-modal training if not llama2 original weights?
My current understanding is the following:
> **inference** uses the checkpoints r…
-
# 🚀 Feature request
Currently `GenerationMixin.generate()` only accepts `input_ids` but not `inputs_embeds`. Therefore this method is not usable when custom input embeddings are required. In contra…
ymfa updated
10 months ago
-
hello, how to determine which modality is input during reasoning? Is a classification network used before the unimodal expert transfomer?
-
### Week 1 - Get to know the community
- [X] Join the communication channels
- [X] Open a GitHub issue (this one!)
- [x] Install the Ersilia Model Hub and test the simplest model
- [x] Write a motiva…
-
Hi @YIKUAN8, @HanyinWang, @yuanluo,
Is there any rough comparison of speed of extracting features, for [VisualBERT](https://github.com/uclanlp/visualbert/), [LXMERT](https://github.com/airsplay/lxmer…
-
### When did you clone our code?
I cloned the code base after 5/1/23
### Describe the issue
I manually download the pre-trained model at my path, here, which click the download button for each.
![…
-
https://mp.weixin.qq.com/s/YnO9IeNfvqcJq4gcNWaxeA
ixxmu updated
6 months ago
-
After running the inference command on DressCode dataset . The result using test_pairs_unpaired.txt has some unexpected distortion on the body(picture attached), especially the missing arms. Maybe I'm…
-
Dear author, first of all, thanks for your great work. After reading your paper, I really want to know how to calculate the params and the runtime of adding Focals Conv to VoxelRCNN as u mentioned in …