We used the same config in this repo to train AVE task on a 3090, but the accuracy we got is 78.96.
python3 main_trans.py --Adapter_downsample=8 --audio_folder=$PATH/raw_audio --batch_size=2 …
Thanks for sharing great work and dataset!
I have two questions about paper.
First of all, I think that authors mainly follow the losses and architecture of ALBEF. But, CTP do not use the ITM loss…
I can't find any research paper corresponding to this work. How can I cite your work in my research paper? I need it in the form of bibtex, for example like below:
I wonder that could the modality A interact with modality B in training?
I guess that each tokenizer process their modality sperately, each modality was transformed by the freezed encoder(concat all…
Hi, I am interested in your excellent work!
When I run the test script
"python unidistill/exps/multisensor_fusion/nuscenes/BEVFusion/BEVFusion_nuscenes_centerhead_camera_exp.py -b 1 --gpus 1 -p …
The original data link in the paper "Recipe Recognition with Large Multimodal Food Dataset" has expired, and the original raw data is unavailable. Is it possible for you to release the original raw te…
After I run the code, only the model parameters are printed out, but no fused modes are printed out. Does anyone know where the fused modes go?
### Describe the issue linked to the documentation
Hey there. Just noticed an error in the code execution of the example of the Best Quality presets description here:
Dear authors,
I found your work on transcriptomics, histology, and multimodal fusion for classification tasks to be quite interesting. I would like to know more about the folder structure you used …
貼吧活動:(請查閱 [SARS-CoV-2 Timeline by 2020.02.21](https://github.com/agorahub/_meta/blob/agoran/theagora/sari/Memorandum_2020-02-21_SARS-CoV-2-Timeline_Nathan.pdf?raw=true), by Nathan :cloud: )
- Colla…