-
Hi there, thank you so much for putting this repository together for this implementation, it's very interesting!
I'm working on implementing this with a custom COCO instances formatted dataset rath…
-
The inception model I reproduced couldn't do what you did. We usually have 229✖229 as input for that model, here it is 224✖224. does this have any effect please? Looking forward to your reply.
-
Following what was done by @ChainYo in Transformers, in the [ONNXConfig: Add a configuration for all available models](https://github.com/huggingface/transformers/issues/16308) issue, the idea is to a…
-
Hi @YuanGongND ,
Thank you for the great work. I am trying to use the AST model for extracting embeddings for audio events of 1s.
To begin with, i started to play around with https://github.com/Yu…
-
Hi, I haven't been able to figure out if it's possible to implement object picking (using the mouse). I know that three.js does support mouse picking (e.g. https://threejs.org/docs/#api/en/core/Raycas…
-
Dashboard to track the performance of torchinductor on CPU.
cc @ezyang @msaroufim @wconstab @bdhirsh @anijain2305 @zou3519 @soumith @ngimel @chauhang
-
Hi, I can not find the reported GFlops on the paper about the model Vit-Adapter-L (Mask2former/Beit v2/crop size 896), can you tell me the relevant data or the method to calculate it. Thanks a lot!
-
- https://arxiv.org/abs/2106.08254
- 2021
自己教師付き視覚表現モデルBEiT(Bidirectional Encoder representation from Image Transformersの略)を紹介する。
自然言語処理分野で開発されたBERTに倣い、ビジョントランスフォーマーを事前学習するためのマスク付き画像モデリングタスクを提案す…
e4exp updated
3 years ago
-
I have been running the Swin_Transformer and VMamba models on the same A800 GPU, using same batch sizes and the COCO2017 detection dataset.
However, I've observed that VMamba performs at least 5 tim…
-
Hello,
I hope this message finds you well.
I would like to express my admiration for your work. It is truly straightforward and effective. However, I have encountered an issue while attempting t…