-
Hello
My name is Suliman Sharif and I am author of a python package called Global-Chem - A Dictionary from common chemical names to their molecular definition.
We have been keeping tracking of y…
-
**0. Summary**
- mPLUG 시리즈 중 하나로 Text-Rich Image 타겟 (Document, Webpage, Table, Chart, Natural Image)
- Adaptive Crop (UReader) + Multimodality-Adaptive Module (Owl2) + H-Reducer (Proposed)
- H-Redu…
hjeun updated
3 months ago
-
While attempting to set up and run the demo notebook from the repository, I encountered multiple issues related to environment setup, package dependencies, and code configurations that significantly h…
-
Hi, I'm trying to run the training.
```
# single modality
python main_alignmif.py -L --workspace kitti360-1908/lidar --enable_lidar --config configs/kitti360_1908.txt
python main_alignmif.py …
-
Can you share the training log of t2m_trans?
I found it difficult to train t2m_trans.
-
Can someone please guide me on how you can process both audio and .txt data through perceiver simultaneously for multimodality learning?
An example code would be nice.
Thanks
-
Could you help to add the paper the list?
Paper (Oral): Boosting 3D Object Detection by Simulating Multimodality on Point Clouds
Paper Link: https://arxiv.org/abs/2206.14971
Thanks!
-
**Is your feature request related to a problem? Please describe.**
I'm frustrated when I can't use multimodal models like "gpt-4-vision-preview" in Cheshire-cat-ai to process and retrieve information…
-
Should we create a new type of instance to handle multimodality (e.g., images, buttons)?
-
From:
- https://github.com/ggerganov/llama.cpp/issues/4216#issuecomment-1991730224
1. cleaning up the clip/llava libs and improving the API
2. in the old implementation, there were many internal…