When balanced_classes=True in make_datastream the labeled data is not shuffled

CuriousAI / ladder

Ladder network is a deep learning algorithm that combines supervised and unsupervised learning

MIT License

516 stars 142 forks source link

When balanced_classes=True in make_datastream the labeled data is not shuffled #5

Closed udibr closed 9 years ago

udibr commented 9 years ago

When balanced_classes=True in line https://github.com/arasmus/ladder/blob/master/run.py#L154 the examples from each class are add one after the other however there is no additional shuffling of i_labeled the only shuffling (with dseed) is on the entire data set in setup_data but then make_datastream is called and it sort outs the labeled examples from each class and undo the shuffling.

This can reduce SGD optimization.

arasmus commented 9 years ago

By looking at the code, I'd disagree. Have you confirmed this with experiments?

So DataStream uses iteration scheme here https://github.com/arasmus/ladder/blob/master/run.py#L164 which is set to ShuffledScheme here https://github.com/arasmus/ladder/blob/master/run.py#L133 That shuffles labeled samples so SGD should work in that sense.

Do you agree?

udibr commented 9 years ago

agree