ai-se / ML-assisted-SLR

Automated Systematic Literature Review
2 stars 2 forks source link

Similarity between data and target #33

Open azhe825 opened 7 years ago

azhe825 commented 7 years ago

Supported by https://github.com/ai-se/ML-assisted-SLR/blob/master/no_ES/src/runner.py

Data Similarity

LDA on 30 topics (number of topics does not matter much) Topic weighting for the two data sets:

L1 similarity, as default of LDA:

L2 similarity, make more sense:

Target Similarity

LDA on 30 topics Topic weighting for the two relevant set:

L1 similarity, as default of LDA:

L2 similarity, make more sense:

Conclusion:

Problem:

timm commented 7 years ago

can u generate some way of building data sets at increasing distance? see how your conclusions fail as you increase distance?

can u use LDA as a faster way to find relevant topics?

azhe825 commented 7 years ago

Will try generating synthetic data. Preparing for midterm this week.

What do you mean by "use LDA as a faster way to find relevant topics"? Apply LDA+SVM on FASTREAD? I have a preliminary result showing that LDA+SVM, 100 topics, performs bit better than FASTREAD in one run. So it might be promising as the target is clearly one specific topic.

timm commented 7 years ago

Preparing for midterm this week.

roger. focus on that