djanloo / cmepda

An LSTM + Encoder network for UHECR AirShowers
1 stars 2 forks source link

Implement curriculum learning #9

Open djanloo opened 2 years ago

djanloo commented 2 years ago

https://arxiv.org/pdf/1904.03626.pdf

djanloo commented 2 years ago

Dataset scoring using a pre-trained model appears to work properly. Next step is using a MCMC to generate samples of increasing difficulty sampling with different probabilities for each level. scores_before scores_after difficulty_distrib

djanloo commented 2 years ago

Each color has the same number of records. Changed strategy from the one proposed in the article since a scoring function does not ensure each cluster to have enough records. data_difficulty

djanloo commented 2 years ago

Data displays the estimates of a student trained with a prof. The student was trained for 25 epochs on the lvl 5 subset of the dataset.

Colors display the difficulty levels of the whole dataset seen by the prof network but predicted by the student. Training over hard data makes easy data difficult (?) @luciapapalini level change

djanloo commented 2 years ago

Keras is not able to increase the dataset length through epochs: https://github.com/tensorflow/tensorflow/issues/41571

A possible solution is writing a training loop from scratch.

luciapapalini commented 2 years ago

@djanloo oh damn it ahah.. Is it really necessary for the curriculum?