JacquesCarette / Drasil

Generate all the things (focusing on research software)
https://jacquescarette.github.io/Drasil
BSD 2-Clause "Simplified" License
141 stars 26 forks source link

Improve variable name conflict detection in GOOL #3867

Open B-rando1 opened 2 months ago

B-rando1 commented 2 months ago

This conversation came out of work on the fix to #2179.

In GOOL, there are a number of places where GOOL checks to see if a variable name has been used in order to create a new variable. Examples include:

In each of these cases, genVarName or one of its derivatives is used to make sure that the generated variable name has not been used before. For example, if we create GOOL code that declares a variable temp and then does a list slice, the Java code would look something like this:

int temp;

ArrayList<Double> temp0 = new ArrayList<Double>(0);for (int j = 1; j < 4; j += 2) {
    temp0.add(myOtherList.get(j));
}
mySlicedList2 = temp0;

This works well, but the problem is it only looks backwards; it doesn't look over the whole program. So if we did the list slice first and then declared the variable, the Java code would look like this:

ArrayList<Double> temp = new ArrayList<Double>(0);for (int j = 1; j < 4; j += 2) {
    temp.add(myOtherList.get(j));
}
mySlicedList2 = temp;

int temp;

This would throw an error, since we're declaring temp multiple times.

Ideally we would do two passes so that we know ahead of time which variable names are 'reserved by the user' - i.e. not to be generated.

If we want to do something like this, what approach should we use?

B-rando1 commented 2 months ago

We had discussed in our meeting that a more concrete example of this would be helpful, so I modified NameGenTest.hs on a separate branch. The GOOL code can be seen here: https://github.com/JacquesCarette/Drasil/blob/5b8e1cd5d1d23aa3346d588d33d99d552092bd88/code/drasil-code/test/NameGenExample.hs#L18-L25

The generated Java code can be seen here: https://github.com/JacquesCarette/Drasil/blob/5b8e1cd5d1d23aa3346d588d33d99d552092bd88/code/stable/gooltest/java/NameGenExample/NameGenExample/NameGenExample.java#L9-L21

As can be seen, the generated names with the for loop avoid conflicts with the first declared variables by appending a 0 to the end of the name. Right after that, however, more user-defined names create conflicts - one with a generated name, and one with another user-defined name.

This was generated by GOOL without any issues, using functionality that has been in place for a while now (I modified the listSlice implementation, but the way it's used here wasn't affected). It shows that while our generated names attempt to be unique, they aren't perfect; and that when conflicts happen GOOL doesn't do anything about it.

From our meeting, we came up with two solutions:

JacquesCarette commented 2 months ago

Amusingly, this is somewhere we could use Haskell's laziness to our advantage: we could lazily generate names and only force them to be actually generated once the full block of code has been processed, so that it would then have the full context in place in which to do correct unique generation. I've definitely seen some code "out there" that does this (I think I recall code for the generation of labels for a one-pass generation of assembly-level code).

smiths commented 2 months ago

Great work @B-rando1. You took what we discussed in the meeting and showed results very quickly! I like the intermediate step of your first option - showing an error if two variables have the same name. From our discussion yesterday, this doesn't seem to be that difficult a change, and it would make GOOL safer. I'm not sure if it is even that bad from the user's point of view, assuming that the error message is descriptive. The user might want control over what their variable is called, instead of relying on the system making a unique name. If we do automatically make unique names in the future from the user input, there should be a log message that lets the user know, or they might be confused by the generated code.

B-rando1 commented 2 months ago

@JacquesCarette that's a cool idea! I'm not quite sure what we need to do to get it there. My understanding is that Haskell is lazy by default, but some operations will force evaluation at a certain point. Is there something we need to change to allow Haskell to defer name generation?

JacquesCarette commented 2 months ago

In theory, nothing to do. In practice, you need to make sure you're not doing certain pattern matches "too early" as that will force evaluation. Plus someone's work (Noah?) on making things strict will definitely interact with this. https://wiki.haskell.org/Maintaining_laziness is a useful resource. I think http://comonad.com/reader/2014/fast-circular-substitution/ is one example of what I mean.

B-rando1 commented 2 months ago

I tried locating the issue by using ['a' | _ <- [0..]] in a few places regarding genVarName. This allowed me to find the following:

I'll need to do some more digging tomorrow to see what the root causes are, and if there's a good fix for them.