-
Hi,
Is there a roadmap for monocular SLAM (with a normal camera, not Kinect) ?
Or is it relatively easy to use ScaViSLAM for a monocular SLAM scenario, with minimal code changes?
If so, what parts …
-
Hi, thanks for your work on AV FGC task. I'd like to inquire about some experiment details in your paper:
1. In 4.1-Audio-Modality in your paper, you use logit average as evaluation strategy, but in …
-
### Describe the issue
When attempting to run inference with my fine-tuned LLaVA model using LoRA, I encountered an error. Here's the code snippet I used:
```
from llava.model.builder import load_p…
-
Thank you for this amazing piece of work.
I'm interested in using VILLA or UNITER to do image retrieval.
I'd like to pre-extract features from VILLA for a folder of images and then retrieve them…
-
Post a link for a "possibility" reading of your own on the topic of Digital Doubles & You in the Loop [for week 9], accompanied by a 300-400 word reflection that: 1) briefly summarizes the article (e.…
lkcao updated
2 years ago
-
An excellent work about multi-modal action recognition, especially on the MMAct Dataset! Howerver, I am failed to download the MMAct dataset to conduct the experiment, thus can you share the MMAct dat…
-
Hi Sijie @TmacMai,
I just started working on multimodal learning using CMU-MOSEI, and I was wondering how the F1 scores evaluated on CMU-MOSEI dataset was calculated? Was it done by calculating F1 …
-
Hello, many thanks for this great work jointly done by so many researchers!
I am trying out the RT-X dataset by running the code snippet from the colab tutorial. I successfully installed `tfds-nigh…
-
I'm trying to train the MBT model on my own dataset. I get the following error. Any help is appreciated.
Traceback (most recent call last):
File "/home/eftekhar/anaconda3/lib/python3.9/runpy.py"…
-
## ❔Question
Thanks for repo!
I want to do multimodal fusion on yolov5 and try to get higher preditction accuracy. I have RGB images and depth images which are aligned. Now here is my idea:
I …