-
Hello,I would like to ask why you use ViT-B/16 as a text encoder. Why not use NLP models as a text encoder? Thank you very much.
-
## 一言でいうと
VQAのような、画像+言語のタスクでTransformerを適用した研究。画像は物体領域の位置ベクトルを使ってSelf-Attention(物体間の関係を学習)、言語は通常通りSelf-Attention、最後にCross(言語to画像、画像to言語のAttention計算)をした後にSelf-Attentionをとって出力を行う。事前学習を通じSOTAを達成
![…
-
你好,我在阅读您的《Mapping Multi-modal Brain Connectome forBrain Disorder Diagnosis via Cross-modal
Mutual Learning》的文献时,深受启发,想使用code进行深入学习,可否告知,ADNI数据集中具体的Series Description和Subject ID 以方便学习,谢谢!
-
Diffusion Deepfake
https://arxiv.org/abs/2404.01579
-
Hello, This is an excellent work! I'm doing a county study on cross-modal image registration. I would like to ask if your method can be applied to 2d cross-mode image registration, such as infrared an…
-
Hi, Menglan Hu! Can I get the code of the paper "Adversarial-Metric Learning for Audio-Visual Cross-Modal Matching" ? I just meet some problems in 'Metric-learning'.
Or can you help me solve these t…
-
Hi,
Thanks for your good work,
For UniXcoder, the example that measures the similarity between NL-PL pairs you showed is using encoder-only mode.
Is it possible that encoder-decoder mode can …
-
hello sir,FIrst thanks for your code,but I want to know is this the code of paper 《Deep adversarial metric learning for cross-modal
retrieval》?Thank you very much.
-
Hello,
Could you please let me know where exactly is the "novelty detection" part of the paper "Zero-Shot Learning Through Cross-Modal Transfer" in your code.
-
### Is this a new feature, an improvement, or a change to existing functionality?
New Feature
### How would you describe the priority of this feature request
Must have (e.g. DALI adoption is imposs…