How to adopt the GREAT model for a program repair task?

VHellendoorn / ICLR20-Great

Data and Code for Reproducing "Global Relational Models of Source Code"

MIT License

83 stars 22 forks source link

How to adopt the GREAT model for a program repair task? #9

Open nashid opened 2 years ago

nashid commented 2 years ago

I would like to evaluate the GREAT model for a program repair task. To start with, I am thinking to make a comparison with Hoppity. Hoppity is mostly compared where there is one AST node difference between the buggy code and the correct code.

I am thinking to use one pointer to the buggy node location and modifying the code so that it can also output the edit operation (i.e., add/remove/replace) and value as a stretch.

Is there a nice way to modify this model for such a task? I presume it would be a non-trivial change!

VHellendoorn commented 2 years ago

Hi! This definitely sounds like something that could be done, but before going down the rabbit hole of adapting the current toolkit for this task, I want to point out that this sounds like a perfect fit for the PLUR toolkit (paper, repo). That work was all about unifying many tasks into a single representation that has this intuitive graph-style encoder and sequence (with edit operation) decoder. We showed that the GREAT model works well for a host of tasks in that work.

One downside: the repo I linked before only includes the task representation part; I have been told that the modeling toolkit will be open-sourced at some point in the not too distant future. Let me know if this is a useful direction; if not, I can definitely share some pointers for expanding the current repo to address other tasks.

nashid commented 2 years ago

Hi Vincent, this was exactly my plan. After reading the PLUR paper, my impression was one major contribution of that paper is the open-sourced framework of PLUR that others can use.

I actually asked for the artefact here but have not heard back.

Github readme states:

The models and the training code from the PLUR paper are not yet part of the current release. 
We plan to release it in the near future.

But I am in limbo as I do not know when the artefact would be released.

VHellendoorn commented 2 years ago

That makes sense. I've been periodically pinging the people on that team about their open-sourcing efforts and am cautiously hopeful that there will be updates in the near future. My advice would definitely be to lean towards waiting on this a bit longer, rather than adapting this code to Hoppity. While we could incorporate a simplified version of the ToCoPo decoder in here, it would probably be quickly made obsolete by the other toolkit.

In fact, I see the PLUR effort as strictly superceding this repository when it is fully released; the modeling toolkit that powers the PLUR toolkit will be much more comprehensive. So maintenance on here will probably stop at that point.