knowledge-distillation Search Results

1000+ results
for knowledge-distillation

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

samsucik/knowledge-distil-bert #3

an inquiry about your knowledge-distil-bert talk

Hello, I'm Hady an ECE student at cairo university school of engineering, I've been working on a distilled version of a text summarization model called pegasus, I found your L3-AI talk on YouTube and …

hadywalied updated 3 years ago
4
wangqiangneu/MT-PaperReading #17

19-EMNLP-Multi-agent Learning for Neural Machine Translation

## 简介跟mutual learning差不多，不一样的是mutual learning是many-to-many的学，这里是先通过many构造出一个ensemble model，再用这个ensemble去教many。教的过程用了根据teacher是否足够好进行自适应的distillation，也是很常见的操作。 ## 论文信息 * Author: Baidu * [Paper](…

wangqiangneu updated 5 years ago
4
xcmyz/FastSpeech #88

How to extract alignment from tacotron2?

Hi, I want to try fastspeech on different dataset. therefore, can you share how to extract alignment from tacotron2? I tried this code, but get bad result for synthesis when inference long sent…

CanKorkut updated 3 years ago
6
adap/flower #2497

Add Flower Baseline: FedHe

### Paper FedHe: Heterogeneous Models and Communication-Efficient Federated Learning ### Link https://arxiv.org/abs/2110.09910 ### Maybe give motivations about why the paper should be implemented …

yashmaurya01 updated 1 year ago
1
facebookresearch/fairseq #4502

validation loss is not decreasing on NAT with zh-en data

hi, i want to train a NAT model for zh-en (about 260k) . I get about 30 BLEU on teacher model , but always overfit on student model There are the following scripts: zh-en preprocessing: `fairse…

kkeleve updated 2 years ago
4
Wp-Zhang/Deep-Color-Transfer #44

About Epochs for Effective Color Transfer in Model Training

Thank you for your training code and dataset. I have been using your dataset and training code for training, and it took a few days to train the model up to 30 epochs. However, the train loss and val …

ChesserZS updated 2 weeks ago
1
AkihikoWatanabe/paper_notes #827

Dataset Distillation with Attention Labels for Fine-tuning B…

https://virtual2023.aclweb.org/paper_P5706.html

AkihikoWatanabe updated 2 months ago
2
zma-c-137/VarGFaceNet #10

知识蒸馏部分收敛很慢

你好，我使用keras重新写了模型并进行训练，使用insightface的resnet100模型作为teacher提取特征，使用softmax-交叉熵和embedding 的 L2 loss，交叉熵loss大约在12左右，L2 loss在0.0038，所以我把L2 loss *2000，训练10epoch ，但是L2 loss下降很慢，只下降到0.0028。请问你们训练的时候要训练多少个…

406747925 updated 2 years ago
3
DrSleep/DenseTorch #3

Training match the paper performance

HI, I am trying to follow your instruction to match the result of the paper usubg NYU dataset. But the mIOU and RMSE are still can not be the same. They get stable after 300 iterations and stop at …

kspeng updated 4 years ago
4
sirius-ai/MobileFaceNet_TF #67

Can it run on ESP-CAM with tensorflow lite?

This may be a bit off-topic not sure though. So I am trying to do face recognition on ESP-CAM with 4MB flash. At the moment the size of weights file for this model is **8MB** so I am not able to pu…

manjrekarom updated 4 years ago
5

上一页 1...30 31 32 33 34 35 36...100 下一页

1000+ results for knowledge-distillation

1000+ results
for knowledge-distillation