cnznb / REMS

Data and Code of REMS
1 stars 0 forks source link

How to use REMS #1

Open dpomianbuff opened 10 months ago

dpomianbuff commented 10 months ago

Hello, I am a researcher at the University of Colorado, Boulder, in the USA. Recently I read your paper “REMS: Recommending Extract Method Refactoring Opportunities via Multi-view Representation of Code Property Graph “ and I found it really fascinating.

I was wondering if I could reuse your tool, REMS, to suggest Extract Method opportunities on different Java projects. I have read the instructions in this repo, however it is not very clear for me how I could reuse the tool. Is there a pre-trained model that I could use right out of the box on my own project?

Thank you so much, Dorin

cnznb commented 10 months ago

Hello Dorin, thank you for reaching out and expressing your interest in our REMS tool. Could you please clarify whether your intention is to utilize the REMS tool primarily for refactoring purposes or for conducting experimental comparisons?

dpomianbuff commented 10 months ago

Hello, Thank you for reaching back to me. I would like to utilize REMS for conducting experimental comparisons, primarily with other extract method tools.

cnznb commented 10 months ago

Hello Dorin, we have added an executable tool under the folder "tool" in the package. You can read the file "How to use REMS tool?" in this directory to learn more about the tool details. Wishing you a smooth experiment.

dpomianbuff commented 10 months ago

Thank you so much for providing the tool. I was wondering if you have a version that is built for Mac or Linux?

cnznb commented 10 months ago

Unfortunately, at the moment, we don't have a version specifically built for Mac or Linux. We apologize for any inconvenience this may cause.

dpomianbuff commented 10 months ago

Hi, I was able to get a Windows virtual machine, and I was able to run rems :). I have to say that the instructions were really clear, and it just worked, so thank you for that.

For now I have a clarification question about the result. Sometimes the recommendation contains non-consecutive lines. For example:

Recommending extracting code lines: 210, 217, 218 2 210:- int newEnd = start()+duration(); 3 217:- layout(); 4 218:- changed();

Is this supposed to be interpreted as two Extract Method recommendations? (One containing the line 210 and the second one containing lines 217 and 218?) Or is it supposed to be interpreted as always one recommendation consisting of non consecutive lines?

Thank you, Dorin

cnznb commented 10 months ago

Hi Dorin,

Thank you for your feedback and I'm glad to hear that you were able to successfully run rems on the Windows virtual machine!

Your clarification question is excellent. In our current version, the recommendations containing non-consecutive lines are treated as separate recommendations. However, we acknowledge the need to improve this aspect. We are actively working on an algorithm to address this limitation and provide more coherent and integrated recommendations. While we are still in the experimental testing phase for this enhancement, we are committed to promptly updating our package once we have achieved significant results.

Thank you again for your valuable input, and please don't hesitate to reach out if you have further questions or suggestions.

Best regards,

dpomianbuff commented 10 months ago

Hi, Thank you so much for getting back to me. I really appreciate you taking the time to answer these questions. I also have to commend you for the speed of the tool. Yes, it takes a little bit to load the pre-trained model, but once it's loaded, the rest of the analysis and suggestion generation is pretty fast!

I would like to ask you another clarification question. I could not find it in the paper, probably I've missed it, but I would like to ask about the precision and recall. You nicely included in the paper the formula you used, Precision = # correct recommended refactorings / # of recommended refactorings. I would like to know what is being considered as a correct recommended refactoring?

Let’s take the following example:

125. public Image getImage(String filename) {
126.         Image image = basicGetImage(filename);
127.         if (image != null)
128.             return image;
129.         // load registered images and try again
130.         loadRegisteredImages(fComponent);
131.         // try again
132.         if (fMap.containsKey(filename))
133.        return (Image) fMap.get(filename);
134.         return null;
135. }

In this case, the Oracle extracted lines [132, 133, 134].

REMS recommended the following:

-----------------------------------------------------------------------------------------------
Testing data : Iconkit
Testing time : 2023-08-30 11:42:52
gecs recommendation result of method getImage [125, 135]:
Recommending extracting code lines: 127, 128, 132, 133
127:-         if (image != null)
128:-             return image;
132:-         if (fMap.containsKey(filename))
133:-           return (Image) fMap.get(filename);
-----------------------------------------------------------------------------------------------

If I understood correctly, there are two recommendations from REMS: the first one lines [127, 128] and the second one lines [132, 133].

I was wondering which of the two recommendations we should compare with the Oracle. Could you kindly inform me if there's a ranking? Additionally, for a recommended refactoring to be considered 'correct', does it need to align perfectly with the Oracle, or is an intersection with the Oracle sufficient?

With gratitude, Dorin

cnznb commented 10 months ago

Hi Dorin,

You're very welcome, and I'm delighted to hear that you've found our tool to be efficient!

Regarding your clarification question, we have uploaded a file in our package (tool/Detection-model-invocation.pdf). We believe you will find the answers you're looking for there.

If you have any more questions or need further clarification, please feel free to reach out to us here. We're here to assist you.

Best regards,

dpomianbuff commented 10 months ago

Thank you so much for your reply. This information was exactly what I was looking for.

With gratitude, Dorin

dpomianbuff commented 9 months ago

Hello, I would like to ask one more clarification question about the precision/recall/F-measure. I saw that you compared the precision/recall/F-measure with previous tools: JDeodorant, JExtract, GEMS, etc. As I was reading the JExtract paper they mention that they use very strict criteria for oracle match: “we only consider as correct a recommendation that matches exactly the one at the oracle; thus, a slight difference of including (or excluding) a statement is enough to be considered a miss“. GEMS is using a slightly different technique: they are ranking their Extract Method candidates and keep the top 5, then compare with the oracle with a tolerance of 1%, 2%, and 3% lines of code.

If we take the previous example:

Testing data : Iconkit
Testing time : 2023-08-30 11:42:52
gecs recommendation result of method getImage [125, 135]:
Recommending extracting code lines: 127, 128, 132, 133
127:-         if (image != null)
128:-             return image;
132:-         if (fMap.containsKey(filename))
133:-           return (Image) fMap.get(filename);

In this case, the Oracle is lines 132, 133, 134. If I’m taking the candidates produced by REMS individually, then the first one [127, 128] does not match the oracle. The second one, [132, 133] matches the oracle except for one line. According to the tolerance percentages (1%, 2%, and 3%), the permitted extra lines is 0 (because the host function is rather small), so according to this definition none of the proposed candidates match the oracle. If I should consider the two proposed candidates as one, as a union, so lines [127 - 133] then, again this would not be a match against the oracle.

Following this rationale, I executed REMS on the Xu et. Al. corpus of 122 Extract Method refactoring instances. I was able to obtain REMS suggestions. I also tried to compute the precision/recall/F-measure of REMS in two ways. For any given function in the oracle, if there are multiple candidates produced by REMS:

  1. Use candidates individually and keep the one that is closest to the oracle (as in the example above)
  2. Make a union of lines of code from each candidate and compare it to the oracle.

On top if this I’m also considering the 1%, 2% and 3% tolerance levels.

In the first case I’m getting a precision of 1.8%, recall of 1.6%, and in the second case I’m getting even less than that. I’m wondering what am I missing?

cnznb commented 9 months ago

Hello Dorin,

Thank you for your feedback!

I apologize for not clarifying this issue in the paper. We actually followed Tiwari et al.[1]'s work and applied absolute tolerance. Therefore, we compared our results with the oracle using a tolerance of 1 to 3 statements, which was used for all tools. For more detailed information, please refer to section 3.2 of this paper[1].

Best regards,

[1] Tiwari, Omkarendra, and Rushikesh Joshi. "Identifying Extract Method Refactorings." 15th Innovations in Software Engineering Conference. 2022.