Open amritbhanu opened 7 years ago
[x] in fig15, caption can you expla F3CF7P30? on make the legend more comprehensibe
[x] btw why are figure 3,6,7,13,14,15 shittier than the rest? excel low res?
[x] for all figs, try to push the to the page whe they are mentioned in text. ie fig 16 17 18
[x] and watch out for table 8. before i fixed it is was 2 pages away from where it is discussed
this is balls "owledge that for now we only validated the im-provement of LDADE over LDA in an unsupervised task (seeTable 3 and 8). The gain from unsupervised task may not beprominent when tested on an supervised task. We would like totest LDADE’s advantage in an supervised task"
u did do a supervised task fig 8,9,20
so why do you say different?
Deadline: Nov 20, 2017 Reviews:
Sorry for the delay in the response to your revision. Both reviewers acknowledge good progress, but have some outstanding issue you need to look into:
-Reviewer 1
Let me clear one misunderstanding.
From my original comment "LDA has been used to help many realistic software engineering tasks (for example, tasks considered by papers included in Table 2)",
I'm not saying that the unsupervised tasks considered prior work listed in Table 2 are not realistic. The point that I would like to convey is: the paper does not evaluate LDADE and LDA directly for any of the supervised or unsupervised tasks listed in Table 2. Considering several of the unsupervised or supervised tasks in Table 2 will be good. For example, consider paper [7] listed in Table 2, can using LDADE give additional/differing conclusion about topics that developers are talking about in Stack Overflow? I leave it to the authors to either perform this analysis or not, but at least the limitation should be acknowledged in the paper and maybe considered as a future work.
Also, there is a need to better motivate the supervised task considered in Section 5.3 (to answer the question: why categorizing StackExchange data into relevant and non relevant categories useful?).
There is also a need to use LDA-GA for the classification task in Section 5.3 and demonstrate whether or not LDADE outperforms LDA-GA. Even if LDADE does not outperform LDA-GA, it is still okay. The paper can inform researchers to use LDADE for unsupervised tasks or tasks requiring short parameter tuning time or tasks requiring better stability, and LDA-GA for supervised tasks or tasks where parameter tuning can be done overnight. However, without the comparison, researchers may not be able to make such a decision in a future work.
-Reviewer 2
But, I have still a couple of comments.
I was really surprised that with just 30 evaluations the algorithm is able to achieve a stable configuration. This is - in my opinion - a result that in some way belittles the issue raised in the paper. If by exploring only 30 different configurations I’m able to obtain a stable solution, probably LDA is in general stable and I don’t need a sophisticated approach to find a much stable solution.
I just need to perform some trials and pick the configuration that provides the best results. Indeed, which are the benefits of DE as compared with a Random Search? I would like to see in the paper a deeper discussion on the very low number of evaluations of the proposed algorithm and – if possible – a comparison with Random Search.
In addition, I concur with Reviewer 1 that the proposed algorithm should be experimented in a scenario more specific for the software engineering community, such as traceability link recovery, feature location, or software artefact labelling.