-
### Enhancement Type
A completely new feature
### Describe the enhancement
The number of configuration options and environment variables is now quite large and unwieldy.
For more intricate s…
-
### 🚀 Feedback Request
This issue is dedicated for collecting community feedback on the Multi-weight support API. Please review the [dedicated article](https://pytorch.org/blog/introducing-torchvis…
-
## Keyword: sgd
### Doubly Stochastic Models: Learning with Unbiased Label Noises and Inference Stability
- **Authors:** Authors: Haoyi Xiong, Xuhong Li, Boyang Yu, Zhanxing Zhu, Dongrui Wu, Dejin…
-
”FasterViT: Fast Vision Transformers with Hierarchical Attention“
https://github.com/NVlabs/FasterViT
![image](https://github.com/huggingface/pytorch-image-models/assets/19152032/82aab753-0a7c-…
-
First, thank you for your great work :D
Question is like the title asked.
> Using the centroids, videos are tokenized and text captions are punctuated. Using the timestamps for each caption, video…
-
frame #6: c10d::ProcessGroupNCCL::ncclCommWatchdog() + 0x10c (0x7f3f32cae6fc in /opt/conda/envs/vrm/lib/python3.10/site-packages/torch/lib/libtorch_cuda.so)
frame #7: + 0xd3e95 (0x7f40271b5e95 in /o…
-
-
Congratulations on shipping FNA backward! Looking forward to using it.
On another note: would it be possible to support arbitrary masking?
MaskDiT outperformed regular DiT, with a 70% reduction …
-
-
I am researching fast and memory-efficient self-supervised pre-training compatible with different vision transformer architectures.
In the third section of your paper (3.1. Preparing MViTv2), you st…