-
Hi!
this is only a draft and summary of all papers and implementations of mamba.
I will put my feedback here, from Orin AGX 64Gb
Original paper:
(arXiv 2024.01) Vision Mamba: Efficient Visual…
-
I am using the InternVL2-40B-AWQ model and performing video inference according to the multi-images inference paradigm. Each video is sampled into 24 frames, and the prompt is as follows. My questions…
-
### Description
I wanna use rtsp url.
8.0.112:554/stream1'
Model: models/bodypix_mobilenet_v1_075_1024_768_16_quant_decoder_edgetpu.tflite
Heatmap size: (65, 49)
Stride: 16 (65, 49)
Infere…
-
# Papers
- Sapiens: Foundation for Human Vision Models
- 메타에서 나온 Human foundation model ㄷㄷㄷ
- 2D pose estimation, body-part segmentation, depth prediction and normal prediction이 하나의 모델에서 …
-
Hi there!
Thanks for the effort to maintain this amazing repository.
This is a request to add our recent work on evaluation of Video Models. We propose an evaluation benchmark, _VELOCITI_.
Plea…
-
Can your model be fed with multiple images at once, such as different frames of a video? Or can it be modified so that the input to the language model is the tokens of multiple images at once?
-
I have 2 lists with exactly the same fields and code, just different names. I have set them to only appear on item view of the list [videoObject] when they are not null. What happens is that one of th…
-
![WhatsApp Image 2024-05-21 at 19 26 33_4d19396b](https://github.com/samkamau81/AI4Education-Moringa/assets/63351043/2f197862-7289-4880-8e65-58895ebf3a80)
Using Django for our backend, come up …
-
link text: [VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs](https://arxiv.org/pdf/2306.02858)
Actual title of the paper: Video-LLaMA An Instruction-tuned Audi…
-
### OpenVINO Version
2024.0.0
### Operating System
Ubuntu 20.04 (LTS)
### Device used for inference
None
### OpenVINO installation
PyPi
### Programming Language
Python
### Hardware Architect…