ai-se / Pits_lda

IST journal 2017: Tuning LDA
https://github.com/amritbhanu/LDADE-package
4 stars 4 forks source link

T1 - T2 #3

Closed amritbhanu closed 7 years ago

amritbhanu commented 8 years ago

Experiment

projecta control inertia design specif perform attitud tabl spacecraft note 
interrupt uplink srup fsw verif error follow specif eeprom scr 
tabl initi use fals address ppu obc function event dump 
checksum calcul enabl progress process task idl text oper discuss 
text fault memori error plenum initi second number pressur indic 
mode flight issu execut sequenc current indic point set vml 
switch messag case file mode type code flexelint function use 
wait int variabl vml read dump write verif task verifi 
miss oper set text paramet state check valid indic number 
subaddress address telemetri packet word fsw data buffer request bootload 

For T2 - Project A

obc safe fault projecta power address flight mode state spacecraft 
code data function line valu access variabl use messag record 
srobc rate spacecraft flight memori prd alloc provid link point 
non load int unsign bit eeprom obc comput control data 
control mode point attitud error plenum sroac target main high 
grand word tlm type packet count cmd byte header command 
file defin line tlm data statu macro array ambi len 
command softwar flight trace link srup task uplink time spacecraft 
variabl initi messag line code entri valu use extern mode 
test script verifi mode engcntrl link indic issu procedur data 
amritbhanu commented 8 years ago

@timm Prof. any comments? I am looking for min 5 terms to be matched to be considered for a topic overlap.

timm commented 8 years ago

are these results stable? i.e. different runs generate different topics?

and is there any discussion in the literature about lda topic instability?

amritbhanu commented 8 years ago

Project A - T1

file

Project A - T2

file

timm commented 8 years ago

cant get an executive summary of these.

only 20% of these topics are stable across multiple runs?

if run N times and collect the topics in all N are there repeated patterns?

amritbhanu commented 8 years ago

stable means: if a topic has occurred more than 5 times in 10 runs. This answers your 3rd question as well. And yes only 20% of topics are stable. But here I am only finding top 10 topics.

timm commented 8 years ago

Hmmm.... looks liek its time to check if anyone else has found topics to be unstable

is this apper useful to you?

How to Effectively Use Topic Models for Software Engineering Tasks? An Approach Based on Genetic Algorithms ==> paper

@inproceedings{panichella2013effectively, title={How to effectively use topic models for software engineering tasks? an approach based on genetic algorithms}, author={Panichella, Annibale and Dit, Bogdan and Oliveto, Rocco and Di Penta, Massimiliano and Poshyvanyk, Denys and De Lucia, Andrea}, booktitle={Proceedings of the 2013 International Conference on Software Engineering}, pages={522--531}, year={2013}, organization={IEEE Press} }

using "grid-serach" idea to tune 4 hyper-parameters of LDA, each divided into 10 bins, to investigate "What is the impact of the configuration parameters on LDA’s performance in the context of software engineering tasks"

This research question aims at justifying the need for an automatic approach that calibrates LDA’s settings when LDA is applied to support SE tasks. they analyzed a large number of LDA configurations for three software engineering tasks. The presence of a high variability in LDA’s performances indicates that, without a proper calibration, such a technique risks being severely under-utilized

timm commented 8 years ago

do you know how to find who has cited a paper?

Step1: look for it in google scholar

https://scholar.google.com/scholar?hl=en&q=How+to+effectively+use+topic+models+for+software+engineering+tasks%3F+an+approach+based+on+genetic+algorithms&btnG=&as_sdt=1%2C34&as_sdtp=

image

Step3: click on the "cited by 73" link :

https://scholar.google.com/scholar?cites=9122112158639969994&as_sdt=5,34&sciodt=0,34&hl=en

enjoy!