eclipse-jdt / eclipse.jdt.core

Eclipse Public License 2.0
149 stars 118 forks source link

Revise usage of `encoding` in more CompilationUnitResolver methods #2641

Open mickaelistria opened 2 weeks ago

mickaelistria commented 2 weeks ago

Quoting @jukzi on https://github.com/eclipse-jdt/eclipse.jdt.core/pull/2560#discussion_r1654179571

Making the encoding part of the API seems wrong as currently the encoding is not used consistently: Only Batch Compiler uses it in org.eclipse.jdt.internal.compiler.batch.CompilationUnit.getContents() while the IDE uses org.eclipse.jdt.internal.core.util.Util.getResourceContentsAsCharArray(IFile) which disrespect the given classpath encoding but uses the encoding configured in workspace and the utf bom.

We should revise usage of encoding here to either actually use it, or get rid of it.

robstryker commented 2 weeks ago

The encodings in https://github.com/eclipse-jdt/eclipse.jdt.core/pull/2560 are used. I'm not sure why they are presumed to be ignored.

A quick trace through some methods shows that the encodings are used.

https://github.com/eclipse-jdt/eclipse.jdt.core/blob/aca0803053b990f66cfb08644a96aeabe56fa255/org.eclipse.jdt.core/dom/org/eclipse/jdt/core/dom/CompilationUnitResolver.java#L1143

And here:

https://github.com/eclipse-jdt/eclipse.jdt.core/blob/aca0803053b990f66cfb08644a96aeabe56fa255/org.eclipse.jdt.core/dom/org/eclipse/jdt/core/dom/CompilationUnitResolver.java#L592

From what I can tell, both locations make use of contents = Util.getFileCharContent(new File(sourceUnits[i]), encoding);

I am not sure why @jukzi thinks the encodings are unused.

jukzi commented 2 weeks ago

My concern is not they are unused but they are used inconsistently. Both references you gave create a batch.CompilationUnit while for example org.eclipse.jdt.internal.core.CompilationUnit does not know about it

robstryker commented 2 weeks ago

I think you might be looking at this the wrong way.

The encodings are required so that we know how to read the source file and turn it into a string of characters that we can parse. It is not terribly important that the resulting org.eclipse.jdt.internal.core.CompilationUnit is or is not aware of what encoding it had when it was read from the filesystem. It is more important that the file be read correctly and turned into the jdt.internal.core.CompilationUnit correctly.

The encodings are 100% used (in this PR at least) to read the source files before parsing them and turning them into jdt dom items.

This new interface provides only 5 methods. Only 2 of them require an array of encodings. And these encodings are 100% used in every occasion to read the file from the filesystem. And I would imagine very strongly that anyone extending or using this non-API code would similarly be using the encoding to read the file from the filesystem before parsing it. I know our alternate javac implementation does. The encodings are used, even if they are not part of the returned object.

jukzi commented 2 weeks ago

ASTParser has a org.eclipse.jdt.core.dom.ASTParser.setEnvironment(String[], String[], String[], boolean) where someone could setup arbitrary encodings distinct from those used to read a File with Platform.

org.eclipse.jdt.internal.core.CompilationUnit on the other hand reads the content using getResourceContentsAsCharArray() - which does not know about the encoding configured in ASTParser. instead it uses the settings from Platform.

I would like to verify it in the IDE but as far as i see setEnvironment is only used durring tests, so that normally sourcepathsEncodings should be always null.

jukzi commented 2 weeks ago

image

robstryker commented 2 weeks ago

As with all things where a large historical codebase is involved, we assume:

1) Things are generally the way they are for a reason, and

2) It's best not to change that unless you: a) understand the reason it is the way it is, b) know 100% that the reason is wrong or faulty, and c) have a fix

In this case, we assume that the logic that was in ASTParser and the logic that was in CompilationUnitResolver and ECJCompilationUnitResolver was generally correct and had a reason to be there.

The previous PR that sparked this discussion attempts to be very careful about merely moving functionality around, but not changing it much at all, as well as formalizing the contract with the caller.

While there's always a push and pull about whether to add something to the interface or not, there are typical concerns that should be weighed against each other.

1) Which implementation existed before? Try to stay faithful to that one. 2) Is something about the existing implementation dangerous or wrong? A high bar should be present here. 3) Should something NOT be added to the interface because it is 100% unnecessary? Variables that no implementer want or desire to use can easily be removed from the interface, but variables where the pre-existing implementation CLEARLY use those variables, and where there is nothing obviously wrong or dangerous about it, should likely be maintained, both to stay faithful to the pre-existing implementation and test cases, as well as to minimize disruption.

If there was something clearly wrong about including the encodings, or if the CompilationUnitResolver clearly wasn't using them, we would have a good argument to remove them from the interface. Since the variables are clearly being used and respected in their implementation at least, if not by callers, we have to fall back to some type of harm calculation.

Removing encodings from the interface has the potential to break any tests that make use of that ability to override the encodings while providing no significant benefit other than the smaller incremental benefit of pruning a codebase of unused features. I would argue the disruption of potentially adding chaos to the test suite and being forced to either disable a number of tests or find some other way to work around the issue points to a greater harm in removing the parameter than in allowing it to remain.

robstryker commented 1 day ago

@mickaelistria Is this issue still relevant? Or can it be closed?