INRIA / spoon

Spoon is a metaprogramming library to analyze and transform Java source code. :spoon: is made with :heart:, :beers: and :sparkles:. It parses source files to build a well-designed AST with powerful analysis and transformation API.
http://spoon.gforge.inria.fr/
Other
1.75k stars 351 forks source link

CtExtendedModifiers are stored in a HashSet - leads to non-deterministic pretty-printing #4033

Open slarse opened 3 years ago

slarse commented 3 years ago

The ElementPrinterHelper prints out modifiers by fetching the extended modifiers. These are in turn stored inside a HashSet, which is not ordered. The printer-helper sections them in [visibility] [abstract/static] [everything else], but there is no internal ordering in the abstract/static and everything else parts.

So, for example, a class with the modifiers abstract static could be written either abstract static or static abstract depending on the machine state. That's not desirable.

Either we store the modifiers in an ordered collection, or the printer helper must order them when printing. Non-deterministic printing is a hassle.

raghav-deepsource commented 3 years ago

replace HashSet with LinkedHashSet? this would also help with sniper printing to ensure modifier ordering is preserved as written.

monperrus commented 3 years ago

@raghav-deepsource thanks for the suggestion, would you be able to make a tentative PR?

raghav-deepsource commented 3 years ago

yup, was planning to do just that. I take it I would need to make changes to anything that inherits from CtModifiable?

raghav-deepsource commented 3 years ago

or rather, make changes to CtModifierHandler?

I-Al-Istannen commented 3 years ago

I would order them when printing (unless that breaks your sniper). Some elements do not offer addModifier or addExtendedModifier or removeModifier methods. This forces users to fetch the elements, store them in their own collection, mutate it and and use setModifiers.

I personally wouldn't expect this to make printing non-deterministic, as all methods just take and return a Set and normal sets do not make any ordering guarantees. This would however happen if we used a Linked set internally. Another option would be using a SortedSet and making setXXXModifiers only clear and append to it rather than keeping the reference.

raghav-deepsource commented 3 years ago

The intent is to avoid worrying about ordering from the time modifiers are parsed to the point where they are printed out again. it would be better to alter the internal representation to account for this.

I-Al-Istannen commented 3 years ago

The problem is that you can't do that by just using a LinkedHashSet as some elements miss remove/add methods and users are forced to temporarily store it in their own collections.

So using a LinkedHashSet would still lead to non-deterministic printing, if any API user ever touches the API and throws it in a HashSet (which most will do, it is the most common Set). If you switch to a LinkedHashSet you also need to adjust every single caller of setXXModifiers to make sure it keeps the order, if there was one. A LinkedHashSet only treats one symptom but not really the root cause.

If you really want to keep the internal representation the same during the whole life I'd suggest a SortedSet (like TreeSet) and make sure you order the modifiers following conventions. You really want a distinct List there and a SortedSet comes pretty close. You will still need to make sure that no call to setModifiers (or mutations of the field itself) ever change the collection to another Set! I don't know whether you need to adjust existing callers or whether that is already enforced.

Adjusting printing is a one-step solution to keep the output consistent no matter what horrors are done internally. I think it's the nicest solution to fix this issue, but maybe your scope is broader and the SortedSet fulfills some other job you have in mind.

raghav-deepsource commented 3 years ago

I believe the initial order these elements are provided in is important, and should be preserved while printing as well, unless the user specifies otherwise. any changes made here will at worst not be user-visible, and at best will enable the user to determine the order of modifiers.

some elements miss remove/add methods

That is suboptimal... It would be a good idea to implement add and remove methods for all such elements. (unless there are reasons not to? are there examples of such elements?)

How about adding overloads for setModifiers and setExtendedModifiers that accept an ordered collection of items (perhaps a list)? We could convert the provided list into an ordered set, or deduplicate the list when storing it. This obviously will increase complexity, but it could provide a path forward.

If the user does not care about how modifiers are ordered, they can continue to use the old API.

I'd suggest a SortedSet (like TreeSet) and make sure you order the modifiers following conventions.

I still think insertion order is a better way to do this. We should only care about conventions when printing out the modifiers. Format changing pretty printers have to worry about this anyway, but things are made difficult when a user needs the order to be preserved, as is the case for sniper printing.

no call to setModifiers (or mutations of the field itself) ever change the collection to another Set!

This would be impossible anyway, given this API only allows the user to specify an unordered collection of modifiers. We could add new methods that take lists and give users of this API a choice between passing in unordered or ordered data, but we can't keep users from ultimately changing ordering unless we are willing to break compatibility and allow only ordered set types or list types as an input parameter. Another question I have: are there cases where modification to an element may also modify the set of modifiers associated with that element in such a non-deterministic manner?

The current solution afaics is the best of both worlds.

raghav-deepsource commented 3 years ago

I notice that all elements that have modifiers delegate to CtModifierHandler when making any changes to modifiers.

raghav-deepsource commented 3 years ago

I just took a look at CtTypeImpl. It does provide ways to add and remove individual ModifierKind values, but it does not support adding/removing single CtExtendedModifier instances, only wholesale replacement. I suppose this is mainly because CtModifierHandler itself does not provide a way to do this.

Shouldn't this be made more consistent as well?

slarse commented 3 years ago

Okay, let's double back around to the original problem here, namely that the printing is non-deterministic. As @I-Al-Istannen has pointed out, the fact that many users of modifiable elements might just be calling setExtendedModifiers(Set), where Set in general has no defined order (which is the problem in the first place) is definitely a concern.

I agree with @raghav-deepsource that insert order is best, because it makes the sniper printer more accurate (it will print in whatever order is given to it). However, it seems to me like we need some infrastructure around that, and so perhaps we should see to those things first. I don't have the time right at this moment to look into exactly what's needed, but let's set the end goal of having the pretty-printing respect the insert-order of the modifiers.

Also, note that we are moving away from the use of the ModifierKind enum, so any new API functions should only be implemented for CtExtendedModifier.

raghav-deepsource commented 3 years ago

note that we are moving away from the use of the ModifierKind enum

Is there documentation on this I could look at?

let's set the end goal of having the pretty-printing respect the insert-order of the modifiers

Agreed. The solution in the PR appears to work (no tests fail), but ultimately is only a part of what we would actually require to do this.

One concern is that the existing api allows the user to accidentally reorder modifiers whenever they call setModifiers or setExtendedModifiers. One solution to this would be to change the signatures of these methods to take an ordered collection, or only take a linked hashset. Both would be breaking changes. Another option is to provide an overload that takes a list. The original functions could then be marked deprecated.

We would also need to ensure that we don't change the order of modifiers while building the model.

I-Al-Istannen commented 3 years ago

Another (bigger) problem

First of all: Spoon currently does the following to extract modifiers from JDT:

    static Set<CtExtendedModifier> getModifiers(int modifier, boolean implicit, boolean isMethod) {
        Set<CtExtendedModifier> modifiers = new LinkedHashSet<>();
        if ((modifier & ClassFileConstants.AccPublic) != 0) {
            modifiers.add(new CtExtendedModifier(ModifierKind.PUBLIC, implicit));
        }
        if ((modifier & ClassFileConstants.AccPrivate) != 0) {
            modifiers.add(new CtExtendedModifier(ModifierKind.PRIVATE, implicit));
        }
        if ((modifier & ClassFileConstants.AccProtected) != 0) {
            modifiers.add(new CtExtendedModifier(ModifierKind.PROTECTED, implicit));
        }
        if ((modifier & ClassFileConstants.AccStatic) != 0) {
            modifiers.add(new CtExtendedModifier(ModifierKind.STATIC, implicit));
        }
        if ((modifier & ClassFileConstants.AccFinal) != 0) {
            modifiers.add(new CtExtendedModifier(ModifierKind.FINAL, implicit));
        }
        if ((modifier & ClassFileConstants.AccSynchronized) != 0) {
            modifiers.add(new CtExtendedModifier(ModifierKind.SYNCHRONIZED, implicit));
        }
        if ((modifier & ClassFileConstants.AccVolatile) != 0) {
            modifiers.add(new CtExtendedModifier(ModifierKind.VOLATILE, implicit));
        }
        // a method can never be transient, but it can have the flag because of varArgs.
        // source: https://stackoverflow.com/questions/16233910/can-transient-keywords-mark-a-method
        if (!isMethod && (modifier & ClassFileConstants.AccTransient) != 0) {
            modifiers.add(new CtExtendedModifier(ModifierKind.TRANSIENT, implicit));
        }
        if ((modifier & ClassFileConstants.AccAbstract) != 0) {
            modifiers.add(new CtExtendedModifier(ModifierKind.ABSTRACT, implicit));
        }
        if ((modifier & ClassFileConstants.AccStrictfp) != 0) {
            modifiers.add(new CtExtendedModifier(ModifierKind.STRICTFP, implicit));
        }
        if ((modifier & ClassFileConstants.AccNative) != 0) {
            modifiers.add(new CtExtendedModifier(ModifierKind.NATIVE, implicit));
        }
        return modifiers;
    }

So, no matter what you do in the rest of the code, the modifier order is already lost when parsing the code. @raghav-deepsource PR therefore likely doesn't achieve anything in its current state -- and it doesn't test the parsing workflow at all.

So you would need to figure out how to get the actual declaration order from JDT first.

And some general comments

A LinkedHashSet is not a good data structure for this either as it only accepts insertion order. If you want to write a transform that makes variables final (for example) you might want to insert final in a special place but keep the other modifiers in the order they appeared in the source. This is quite annoying with a LinkedHashSet as you would need to copy the whole Set to a List, make your change and change it back to a LinkedHashSet. Alternatively you directly build a new LinkedHashSet by iterating over it and figuring out where you want it, but that can easily get a cumbersome as well.

I am not convinced a LinkedHashSet is the best solution here, as it communicates next to nothing to unsuspecting callers while simultaneously being annoying for callers who are aware but need to insert at a given place.

Maybe sketching out an "ideal" API and then looking at how close we can get without breaking every user could be nice? Additionally, finding all places in the code (and maybe projects using spoon) that currently destroy the ordering would be nice as we definitely need to fix those if we want this change to do anything in practice.

raghav-deepsource commented 3 years ago

Oh. Yup. that would mean I've been taking the wrong approach here. One idea occurred to me regarding this, which could fix the unordered modifiers problem, but will be costlier...

Process the line where the affected element starts, and extract out the list of modifiers associated with it. Since this would be more work than just processing jdt's modifiers, it could be optional?

SirYwell commented 3 years ago

Since this would be more work than just processing jdt's modifiers, it could be optional?

That's actually already done already - CtExtendedModifier has a SourcePosition attribute that is calculated by spoon. I don't know how well that works, but theoretically it's possible to sort by source code position.

With this approach, I think it would make sense to have a configurable Comparator in the PrettyPrinter. That way, one could sort modifiers:

This way, you could also implement your very own but deterministic behaviour without changing anything in the model. What do you think about that?

Edit

https://github.com/INRIA/spoon/blob/2fff4c50266f79bc06b961e3b01ca6c6e0fcc1ab/src/main/java/spoon/reflect/visitor/ElementPrinterHelper.java#L88

The ElementPrinterHelper already has a dedicated method to write modifiers, but there is no way to change the used implementation in the DefaultJavaPrettyPrinter. But having such a method that can be overriden would be perfect in my opinion.

slarse commented 3 years ago

That's actually already done already - CtExtendedModifier has a SourcePosition attribute that is calculated by spoon. I don't know how well that works, but theoretically it's possible to sort by source code position.

That rings a bell to something I've already done once: sort by position if available, otherwise sort by name. See #3355, perhaps can be reused?

raghav-deepsource commented 3 years ago

sounds good. I've closed the existing PR, in favor of implementing the functionality as @SirYwell has described.