Tune similarity - Githubissues

azhe825 commented 7 years ago

Why?

Target similarity of three data sets: Hall, Wahono, Abdellatif.

LDA with default parameter: alpha=0.1, eta=0.1. Topic number = 30.
Repeat 10 times with different sequence order
Hall vs Wahono is stable, with iqr = 0.002, since the two are very similar.
Abdellatif vs the two are not, with iqr = 0.13, 0.13. Which means target similarity between Abdellatif and Hall can be 0.4 or 0.6 or ...

Need to stabilize the target similarity.

How?

Tune LDA paremeters (Decision = [alpha, eta]). Don't want to change topic number.

Objective = [iqrs].

Differential evolution, 10 candidates per generation, 10 generations max.

Running on NCSU HPC with single node, 10 threads.

Result

Best decisions: [alpha = 0.3636991597795636, eta = 0.9722983748261428] Best objectives (iqrs): [0.0064311303948402232, 0.039641889335073899, 0.048358360331471784] iqrs before tuning: [0.002, 0.129, 0.129] medians of similarities: [0.98309488776481135, 0.45742986887869136, 0.4108420090949999]

Conclusion

Tuning LDA is essential to get a stabilized similarity score.

timm commented 7 years ago

dont get this. what were the similarity scores B4 tuning?

pause

oh- the IQRs are way down .13 to .03 or even 0.06

Q: does that mean know you now how to select which method for reuse?

timm commented 7 years ago

and how long did tuning take?

azhe825 commented 7 years ago

how long did tuning take?
- about 30 hours on 10 cores, so 300 hours on single core.
- does that mean know you now how to select which method for reuse?
- no, same reason as before
- what does it solve?
- we no longer get similarity scores with variances.
- Before: first time we get a similarity score, second time we may get a different one if order of docs changes.
- Now: the similarity score is stable after tuning.

ai-se / ML-assisted-SLR

Tune similarity #43

Why?

How?

Result

Conclusion