-
We should take a look at the [Solid code of conduct](https://github.com/solid/process/blob/main/code-of-conduct.md) to see if we want to emphasize anything inclusion related and to clarify what role D…
-
作者您好,我理解deit_dge.py中,应该是使用了 Dynamic Grained Encoder。但是在代码dge.py中,sparse_train默认是false,并且这个参数在DynamicGrainedEncoder初始化的时候也没改,那么训练的时候走的是这个循环吗?
https://github.com/StevenGrove/vtpack/blob/38d5af0c3d91811…
-
Below are models that we still need to implement
# Will implement
These are models for which we are actively seeking implementations/weights.
## Model implementations
- [x] VGG
- [x] ResN…
-
imp = tp.importance.MagnitudeImportance(p=1)
example_inputs = torch.randn(1, 3, 224, 224)
ignored_layers = []
for m in model.modules():
if isinstance(m, torch.nn.Linear) and m.…
-
### Feature request
Some of our models interpolate its positional embeddings, enabling pretrained checkpoints to be used on different input resolutions. For example, [here in ViT](https://github.co…
-
Cannot find it in the paper? Is it the same as training on imagenet?
-
Hello, regarding the test of ViT and Cross ViT on cifar100, no matter how I adjust, the test accuracy is only about 50%. Do you have a better training method?
-
Hi, thank you for sharing your excellent work. In Table 6 of your paper, you showed how simply applying HAQ affects accuracy of several Deit Models comparing to baseline. However, in the supplemental …
-
Hi, It is great that you have made this repo happen and got so much done. I notice that the default fp32 model has a different dicts to huggingface and deit pretrained version. BUT, the repo doesn't c…
-
## 一言でいうと
Transformerに画像を効率的に学習させる手法。CNNのモデルを使用し蒸留を行う。通常画像パッチ+教師ラベル予測用Tokenを入力とするが、教師CNNの出力を予測する蒸留用Tokenを追加。ラベル+教師出力を別々のTokenから予測し学習する。8NodeのGPU1つを使い2~3日で学習完了。
### 論文リンク
https://arxiv.org/abs…