Open slarse opened 3 years ago
replace HashSet
with LinkedHashSet
? this would also help with sniper printing to ensure modifier ordering is preserved as written.
@raghav-deepsource thanks for the suggestion, would you be able to make a tentative PR?
yup, was planning to do just that. I take it I would need to make changes to anything that inherits from CtModifiable?
or rather, make changes to CtModifierHandler?
I would order them when printing (unless that breaks your sniper). Some elements do not offer addModifier
or addExtendedModifier
or removeModifier
methods. This forces users to fetch the elements, store them in their own collection, mutate it and and use setModifiers
.
I personally wouldn't expect this to make printing non-deterministic, as all methods just take and return a Set and normal sets do not make any ordering guarantees. This would however happen if we used a Linked
set internally.
Another option would be using a SortedSet
and making setXXXModifiers
only clear and append to it rather than keeping the reference.
The intent is to avoid worrying about ordering from the time modifiers are parsed to the point where they are printed out again. it would be better to alter the internal representation to account for this.
The problem is that you can't do that by just using a LinkedHashSet as some elements miss remove/add methods and users are forced to temporarily store it in their own collections.
So using a LinkedHashSet would still lead to non-deterministic printing, if any API user ever touches the API and throws it in a HashSet (which most will do, it is the most common Set).
If you switch to a LinkedHashSet you also need to adjust every single caller of setXXModifiers
to make sure it keeps the order, if there was one. A LinkedHashSet only treats one symptom but not really the root cause.
If you really want to keep the internal representation the same during the whole life I'd suggest a SortedSet (like TreeSet) and make sure you order the modifiers following conventions. You really want a distinct List
there and a SortedSet comes pretty close.
You will still need to make sure that no call to setModifiers
(or mutations of the field itself) ever change the collection to another Set! I don't know whether you need to adjust existing callers or whether that is already enforced.
Adjusting printing is a one-step solution to keep the output consistent no matter what horrors are done internally. I think it's the nicest solution to fix this issue, but maybe your scope is broader and the SortedSet fulfills some other job you have in mind.
I believe the initial order these elements are provided in is important, and should be preserved while printing as well, unless the user specifies otherwise. any changes made here will at worst not be user-visible, and at best will enable the user to determine the order of modifiers.
some elements miss remove/add methods
That is suboptimal... It would be a good idea to implement add and remove methods for all such elements. (unless there are reasons not to? are there examples of such elements?)
How about adding overloads for setModifiers
and setExtendedModifiers
that accept an ordered collection of items (perhaps a list)? We could convert the provided list into an ordered set, or deduplicate the list when storing it. This obviously will increase complexity, but it could provide a path forward.
If the user does not care about how modifiers are ordered, they can continue to use the old API.
I'd suggest a SortedSet (like TreeSet) and make sure you order the modifiers following conventions.
I still think insertion order is a better way to do this. We should only care about conventions when printing out the modifiers. Format changing pretty printers have to worry about this anyway, but things are made difficult when a user needs the order to be preserved, as is the case for sniper printing.
no call to setModifiers (or mutations of the field itself) ever change the collection to another Set!
This would be impossible anyway, given this API only allows the user to specify an unordered collection of modifiers. We could add new methods that take lists and give users of this API a choice between passing in unordered or ordered data, but we can't keep users from ultimately changing ordering unless we are willing to break compatibility and allow only ordered set types or list types as an input parameter. Another question I have: are there cases where modification to an element may also modify the set of modifiers associated with that element in such a non-deterministic manner?
The current solution afaics is the best of both worlds.
I notice that all elements that have modifiers delegate to CtModifierHandler when making any changes to modifiers.
I just took a look at CtTypeImpl. It does provide ways to add and remove individual ModifierKind values, but it does not support adding/removing single CtExtendedModifier instances, only wholesale replacement. I suppose this is mainly because CtModifierHandler itself does not provide a way to do this.
Shouldn't this be made more consistent as well?
Okay, let's double back around to the original problem here, namely that the printing is non-deterministic. As @I-Al-Istannen has pointed out, the fact that many users of modifiable elements might just be calling setExtendedModifiers(Set)
, where Set
in general has no defined order (which is the problem in the first place) is definitely a concern.
I agree with @raghav-deepsource that insert order is best, because it makes the sniper printer more accurate (it will print in whatever order is given to it). However, it seems to me like we need some infrastructure around that, and so perhaps we should see to those things first. I don't have the time right at this moment to look into exactly what's needed, but let's set the end goal of having the pretty-printing respect the insert-order of the modifiers.
Also, note that we are moving away from the use of the ModifierKind
enum, so any new API functions should only be implemented for CtExtendedModifier
.
note that we are moving away from the use of the ModifierKind enum
Is there documentation on this I could look at?
let's set the end goal of having the pretty-printing respect the insert-order of the modifiers
Agreed. The solution in the PR appears to work (no tests fail), but ultimately is only a part of what we would actually require to do this.
One concern is that the existing api allows the user to accidentally reorder modifiers whenever they call setModifiers
or setExtendedModifiers
. One solution to this would be to change the signatures of these methods to take an ordered collection, or only take a linked hashset. Both would be breaking changes. Another option is to provide an overload that takes a list. The original functions could then be marked deprecated.
We would also need to ensure that we don't change the order of modifiers while building the model.
First of all: Spoon currently does the following to extract modifiers from JDT:
static Set<CtExtendedModifier> getModifiers(int modifier, boolean implicit, boolean isMethod) {
Set<CtExtendedModifier> modifiers = new LinkedHashSet<>();
if ((modifier & ClassFileConstants.AccPublic) != 0) {
modifiers.add(new CtExtendedModifier(ModifierKind.PUBLIC, implicit));
}
if ((modifier & ClassFileConstants.AccPrivate) != 0) {
modifiers.add(new CtExtendedModifier(ModifierKind.PRIVATE, implicit));
}
if ((modifier & ClassFileConstants.AccProtected) != 0) {
modifiers.add(new CtExtendedModifier(ModifierKind.PROTECTED, implicit));
}
if ((modifier & ClassFileConstants.AccStatic) != 0) {
modifiers.add(new CtExtendedModifier(ModifierKind.STATIC, implicit));
}
if ((modifier & ClassFileConstants.AccFinal) != 0) {
modifiers.add(new CtExtendedModifier(ModifierKind.FINAL, implicit));
}
if ((modifier & ClassFileConstants.AccSynchronized) != 0) {
modifiers.add(new CtExtendedModifier(ModifierKind.SYNCHRONIZED, implicit));
}
if ((modifier & ClassFileConstants.AccVolatile) != 0) {
modifiers.add(new CtExtendedModifier(ModifierKind.VOLATILE, implicit));
}
// a method can never be transient, but it can have the flag because of varArgs.
// source: https://stackoverflow.com/questions/16233910/can-transient-keywords-mark-a-method
if (!isMethod && (modifier & ClassFileConstants.AccTransient) != 0) {
modifiers.add(new CtExtendedModifier(ModifierKind.TRANSIENT, implicit));
}
if ((modifier & ClassFileConstants.AccAbstract) != 0) {
modifiers.add(new CtExtendedModifier(ModifierKind.ABSTRACT, implicit));
}
if ((modifier & ClassFileConstants.AccStrictfp) != 0) {
modifiers.add(new CtExtendedModifier(ModifierKind.STRICTFP, implicit));
}
if ((modifier & ClassFileConstants.AccNative) != 0) {
modifiers.add(new CtExtendedModifier(ModifierKind.NATIVE, implicit));
}
return modifiers;
}
So, no matter what you do in the rest of the code, the modifier order is already lost when parsing the code. @raghav-deepsource PR therefore likely doesn't achieve anything in its current state -- and it doesn't test the parsing workflow at all.
So you would need to figure out how to get the actual declaration order from JDT first.
A LinkedHashSet is not a good data structure for this either as it only accepts insertion order. If you want to write a transform that makes variables final (for example) you might want to insert final in a special place but keep the other modifiers in the order they appeared in the source. This is quite annoying with a LinkedHashSet as you would need to copy the whole Set to a List, make your change and change it back to a LinkedHashSet. Alternatively you directly build a new LinkedHashSet by iterating over it and figuring out where you want it, but that can easily get a cumbersome as well.
I am not convinced a LinkedHashSet is the best solution here, as it communicates next to nothing to unsuspecting callers while simultaneously being annoying for callers who are aware but need to insert at a given place.
Maybe sketching out an "ideal" API and then looking at how close we can get without breaking every user could be nice? Additionally, finding all places in the code (and maybe projects using spoon) that currently destroy the ordering would be nice as we definitely need to fix those if we want this change to do anything in practice.
Oh. Yup. that would mean I've been taking the wrong approach here. One idea occurred to me regarding this, which could fix the unordered modifiers problem, but will be costlier...
Process the line where the affected element starts, and extract out the list of modifiers associated with it. Since this would be more work than just processing jdt's modifiers, it could be optional?
Since this would be more work than just processing jdt's modifiers, it could be optional?
That's actually already done already - CtExtendedModifier
has a SourcePosition
attribute that is calculated by spoon. I don't know how well that works, but theoretically it's possible to sort by source code position.
With this approach, I think it would make sense to have a configurable Comparator in the PrettyPrinter. That way, one could sort modifiers:
This way, you could also implement your very own but deterministic behaviour without changing anything in the model. What do you think about that?
The ElementPrinterHelper already has a dedicated method to write modifiers, but there is no way to change the used implementation in the DefaultJavaPrettyPrinter. But having such a method that can be overriden would be perfect in my opinion.
That's actually already done already - CtExtendedModifier has a SourcePosition attribute that is calculated by spoon. I don't know how well that works, but theoretically it's possible to sort by source code position.
That rings a bell to something I've already done once: sort by position if available, otherwise sort by name. See #3355, perhaps can be reused?
sounds good. I've closed the existing PR, in favor of implementing the functionality as @SirYwell has described.
The
ElementPrinterHelper
prints out modifiers by fetching the extended modifiers. These are in turn stored inside aHashSet
, which is not ordered. The printer-helper sections them in[visibility] [abstract/static] [everything else]
, but there is no internal ordering in theabstract/static
andeverything else
parts.So, for example, a class with the modifiers
abstract static
could be written eitherabstract static
orstatic abstract
depending on the machine state. That's not desirable.Either we store the modifiers in an ordered collection, or the printer helper must order them when printing. Non-deterministic printing is a hassle.