-
An excellent work about multi-modal action recognition, especially on the MMAct Dataset! Howerver, I am failed to download the MMAct dataset to conduct the experiment, thus can you share the MMAct dat…
-
Hi,
Is there a roadmap for monocular SLAM (with a normal camera, not Kinect) ?
Or is it relatively easy to use ScaViSLAM for a monocular SLAM scenario, with minimal code changes?
If so, what parts …
-
Hi, thanks for your work on AV FGC task. I'd like to inquire about some experiment details in your paper:
1. In 4.1-Audio-Modality in your paper, you use logit average as evaluation strategy, but in …
-
Thank you for your contribution. Regarding the c-index value of the experiment, it does not reach the result in your paper.Use resnet 50, x20, 256×256patches in CLAM for feature extraction. Multimodal…
-
Thank you for this amazing piece of work.
I'm interested in using VILLA or UNITER to do image retrieval.
I'd like to pre-extract features from VILLA for a folder of images and then retrieve them…
-
May I ask whether you have deleted the task of image text relation classification in the original RpBERT code?
-
Hi Sijie @TmacMai,
I just started working on multimodal learning using CMU-MOSEI, and I was wondering how the F1 scores evaluated on CMU-MOSEI dataset was calculated? Was it done by calculating F1 …
-
Post a link for a "possibility" reading of your own on the topic of Digital Doubles & You in the Loop [for week 9], accompanied by a 300-400 word reflection that: 1) briefly summarizes the article (e.…
lkcao updated
2 years ago
-
**Bug Report Checklist**
- [x] I provided code that demonstrates a minimal reproducible example.
- [?] I confirmed bug exists on the latest mainline of AutoGluon via source install.
- [x] I c…
-
[[Open issues - help wanted!]](https://github.com/vllm-project/vllm/issues/4194#issuecomment-2102487467)
**Update [9/8] - We have finished majority of the refactoring and made extensive progress fo…