T1 - T2 - Githubissues

amritbhanu commented 8 years ago

Experiment

T1 = Represents severity reports from 1 and 2
T2 = Represents severity reports from 3,4 and 5
Each line is representing features of Top 10 topics
Results
Results show there is no overlap of topics between T1 and T2. Other than couple of words, both the topics clusters are different.
For T1 - Project A

projecta control inertia design specif perform attitud tabl spacecraft note 
interrupt uplink srup fsw verif error follow specif eeprom scr 
tabl initi use fals address ppu obc function event dump 
checksum calcul enabl progress process task idl text oper discuss 
text fault memori error plenum initi second number pressur indic 
mode flight issu execut sequenc current indic point set vml 
switch messag case file mode type code flexelint function use 
wait int variabl vml read dump write verif task verifi 
miss oper set text paramet state check valid indic number 
subaddress address telemetri packet word fsw data buffer request bootload

For T2 - Project A

obc safe fault projecta power address flight mode state spacecraft 
code data function line valu access variabl use messag record 
srobc rate spacecraft flight memori prd alloc provid link point 
non load int unsign bit eeprom obc comput control data 
control mode point attitud error plenum sroac target main high 
grand word tlm type packet count cmd byte header command 
file defin line tlm data statu macro array ambi len 
command softwar flight trace link srup task uplink time spacecraft 
variabl initi messag line code entri valu use extern mode 
test script verifi mode engcntrl link indic issu procedur data

amritbhanu commented 8 years ago

@timm Prof. any comments? I am looking for min 5 terms to be matched to be considered for a topic overlap.

timm commented 8 years ago

are these results stable? i.e. different runs generate different topics?

and is there any discussion in the literature about lda topic instability?

amritbhanu commented 8 years ago

These are the graphs talking about stability. With min x terms matched, the same topic is being generated in x% of times. Some topics (~20%) are generated in all the runs.
But there has been like next to none overlap between T1 and T2.
Will get back to you on the literature review. There has been some studies.

Project A - T1

file

Project A - T2

file

timm commented 8 years ago

cant get an executive summary of these.

only 20% of these topics are stable across multiple runs?

if run N times and collect the topics in all N are there repeated patterns?

amritbhanu commented 8 years ago

stable means: if a topic has occurred more than 5 times in 10 runs. This answers your 3rd question as well. And yes only 20% of topics are stable. But here I am only finding top 10 topics.

timm commented 8 years ago

Hmmm.... looks liek its time to check if anyone else has found topics to be unstable

is this apper useful to you?

How to Effectively Use Topic Models for Software Engineering Tasks? An Approach Based on Genetic Algorithms ==> paper

@inproceedings{panichella2013effectively, title={How to effectively use topic models for software engineering tasks? an approach based on genetic algorithms}, author={Panichella, Annibale and Dit, Bogdan and Oliveto, Rocco and Di Penta, Massimiliano and Poshyvanyk, Denys and De Lucia, Andrea}, booktitle={Proceedings of the 2013 International Conference on Software Engineering}, pages={522--531}, year={2013}, organization={IEEE Press} }

using "grid-serach" idea to tune 4 hyper-parameters of LDA, each divided into 10 bins, to investigate "What is the impact of the configuration parameters on LDA’s performance in the context of software engineering tasks"

This research question aims at justifying the need for an automatic approach that calibrates LDA’s settings when LDA is applied to support SE tasks. they analyzed a large number of LDA configurations for three software engineering tasks. The presence of a high variability in LDA’s performances indicates that, without a proper calibration, such a technique risks being severely under-utilized

timm commented 8 years ago

do you know how to find who has cited a paper?

Step1: look for it in google scholar

https://scholar.google.com/scholar?hl=en&q=How+to+effectively+use+topic+models+for+software+engineering+tasks%3F+an+approach+based+on+genetic+algorithms&btnG=&as_sdt=1%2C34&as_sdtp=

Step3: click on the "cited by 73" link :

https://scholar.google.com/scholar?cites=9122112158639969994&as_sdt=5,34&sciodt=0,34&hl=en

enjoy!

ai-se / Pits_lda

T1 - T2 #3

Experiment

Results

For T1 - Project A

For T2 - Project A

Project A - T1

Project A - T2