IBM / mi-prometheus

Enabling reproducible Machine Learning research

http://mi-prometheus.rtfd.io/

Apache License 2.0

42 stars 18 forks source link

Online and offline trainers aggregate different number samples for MNIST during training #50

Closed tkornuta-ibm closed 5 years ago

tkornuta-ibm commented 5 years ago

It is the same for CIFAR10:

[2018-11-05 18:05:41] - INFO - OnlineTrainer >>> episode 014857; episodes_aggregated 000782; loss 1.7145928144; loss_min 1.3529133797; loss_max 2.1907787323; loss_std 0.1282402277; epoch 18; acc 0.3774976134; acc_min 0.1875000000; acc_max 0.5781250000; acc_std 0.0594680607; samples_aggregated 049936 [Full Training]

with settings:

Use sampler that operates on a subset.

sampler:
    name: SubsetRandomSampler
    indices: [0, 45000]

tkornuta-ibm commented 5 years ago

running the same setting with OfflineTraner:

[2018-11-05 18:06:16] - INFO - OfflineTrainer >>> episode 000704; episodes_aggregated 000704; loss 2.3036611080; loss_min 2.2822864056; loss_max 2.3270912170; loss_std 0.0059254402; epoch 00; acc 0.0993874297; acc_min 0.0156250000; acc_max 0.2500000000; acc_std 0.0382476710; samples_aggregated 045000 [Epoch 0]

vmarois commented 5 years ago

Adressed in #54