ai-se / Tuning-LDA

IST Journal Tuning LDA - LDADE
https://www.sciencedirect.com/science/article/pii/S0950584917300861
0 stars 0 forks source link

2nd IST review #7

Open amritbhanu opened 7 years ago

amritbhanu commented 7 years ago

Deadline: Nov 20, 2017 Reviews:

Sorry for the delay in the response to your revision. Both reviewers acknowledge good progress, but have some outstanding issue you need to look into:

-Reviewer 1

Let me clear one misunderstanding.

A.2. The experiments need to be improved in the following ways/ First, LDA has been used to help many realistic software engineering tasks (for example, tasks considered by papers included in Table 2. There is a need to expand the experiments to compare LDA and LDADE on those realistic software engineering tasks. It is unclear if the task considered in the experiments (Section 5.3) is realistic (why categorizing StackExchange data into relevant and non relevant categories useful?). More than one tasks should have been considered (similar like Panichella et al.’s work).

Our goal in this revision was to be as responsive as possible to your suggestions (as as you can see below, in A.4, A.5 and A.7, we were able to implement much of your advice.) But as to applying this to “more realistic SE task”, we might have a different perspective on what is a “valid” SE task. We say this since it sounds like you are saying the unsupervised tasks conducted by 23 of 28 recent highly cited LDA papers are not “realistic”? And that we should evaluate this paper only via the supervised tasks seen in 4 of the 28 papers? That is not our view, please see the notes above in A.1 on the need for stability in unsupervised SE tasks.

From my original comment "LDA has been used to help many realistic software engineering tasks (for example, tasks considered by papers included in Table 2)",

I'm not saying that the unsupervised tasks considered prior work listed in Table 2 are not realistic. The point that I would like to convey is: the paper does not evaluate LDADE and LDA directly for any of the supervised or unsupervised tasks listed in Table 2. Considering several of the unsupervised or supervised tasks in Table 2 will be good. For example, consider paper [7] listed in Table 2, can using LDADE give additional/differing conclusion about topics that developers are talking about in Stack Overflow? I leave it to the authors to either perform this analysis or not, but at least the limitation should be acknowledged in the paper and maybe considered as a future work.

Also, there is a need to better motivate the supervised task considered in Section 5.3 (to answer the question: why categorizing StackExchange data into relevant and non relevant categories useful?).

There is also a need to use LDA-GA for the classification task in Section 5.3 and demonstrate whether or not LDADE outperforms LDA-GA. Even if LDADE does not outperform LDA-GA, it is still okay. The paper can inform researchers to use LDADE for unsupervised tasks or tasks requiring short parameter tuning time or tasks requiring better stability, and LDA-GA for supervised tasks or tasks where parameter tuning can be done overnight. However, without the comparison, researchers may not be able to make such a decision in a future work.

-Reviewer 2

But, I have still a couple of comments.

I was really surprised that with just 30 evaluations the algorithm is able to achieve a stable configuration. This is - in my opinion - a result that in some way belittles the issue raised in the paper. If by exploring only 30 different configurations I’m able to obtain a stable solution, probably LDA is in general stable and I don’t need a sophisticated approach to find a much stable solution.

I just need to perform some trials and pick the configuration that provides the best results. Indeed, which are the benefits of DE as compared with a Random Search? I would like to see in the paper a deeper discussion on the very low number of evaluations of the proposed algorithm and – if possible – a comparison with Random Search.

In addition, I concur with Reviewer 1 that the proposed algorithm should be experimented in a scenario more specific for the software engineering community, such as traceability link recovery, feature location, or software artefact labelling.

timm commented 7 years ago
timm commented 7 years ago

this is balls "owledge that for now we only validated the im-provement of LDADE over LDA in an unsupervised task (seeTable 3 and 8). The gain from unsupervised task may not beprominent when tested on an supervised task. We would like totest LDADE’s advantage in an supervised task"

u did do a supervised task fig 8,9,20

so why do you say different?