Closed howlger closed 2 years ago
I think this is to be expected:
I think this is to be expected:
But we should have a fix for that, so that we properly determine native charset, and the not explicitly set encoding stays the system one, not UTF8.
The whole point of changes we did via https://bugs.eclipse.org/bugs/show_bug.cgi?id=516583 and linked bugs was to make sure workspaces and projects have explicit encoding specified, to avoid unattended encoding changes resulting in data corruption. We (me) just missed the point that native charset detection has to change too because of breaking JEP-400 changes.
As @merks already pointed out, this is caused by JEP 400, so the following can be used as workaround.
Workaround when running Eclipse 2022-06 (4.24) with Java 18:
In eclipse.ini
add the following line after the line -vmargs
:
-Dfile.encoding=COMPAT
To get the native encoding, the system property file.encoding
can be used in Java 17 or lower:
String nativeEncoding = System.getProperty("file.encoding");
To get the native encoding in Java 18 as well as in older Java versions, something like the following has to be used instead:
String nativeEncodingSinceJava17 = System.getProperty("native.encoding");
String nativeEncoding = nativeEncodingSinceJava17 == null ? System.getProperty("file.encoding") : nativeEncodingSinceJava17;
I could add this -Dfile.encoding=COMPAT
to the JRE-specific tasks for the 18.x JRE in the catalog. I suppose old VMs would not like this option, right?
I could add this
-Dfile.encoding=COMPAT
to the JRE-specific tasks for the 18.x JRE in the catalog. I suppose old VMs would not like this option, right?
Yes, that would work around the issue for all Eclipses we shipped up to now, and yes, that would needed for Java 18+ only.
I could add this
-Dfile.encoding=COMPAT
to the JRE-specific tasks for the 18.x JRE in the catalog. I suppose old VMs would not like this option, right?
But wouldn't that become a problem on its own? I understand your suggestion as "Oomph packaged Eclipse installations remain with the Cp1252 encoding on Windows for eternity", which would make a default installation quite surprising for new developers sometime in the future (since it would not follow the then established UTF-8 everywhere principle).
And if it is not for eternity, then we only defer the issue and struggle with the same problems again at some point in the future. That's why I would rather be bold and go with the new encoding. If people didn't care about setting an encoding until now, some of them will be confused by either choice that we take. That's enough reason for me to take the more simple route. :)
Or we leave installer with java 17 till we have a fix here (I'm on it).
I've pushed API proposal that we need to fix this issue, see https://github.com/eclipse-platform/eclipse.platform.runtime/pull/63, and two other PR's in platform UI / resources to accommodate this change.
The one is the one that actually determines the workspace default encoding (!), not the https://github.com/eclipse-platform/eclipse.platform.resources/pull/156, which is crazy enough, because IDE preference page that defines workspace defaults uses workbench API that has no dependencies to resources bundle itself. OMG. But now they both will use same API.
Would be good if interested committers could review that.
I just wanted to note that I know from some users using -Dfile.encoding=...
in their eclipse ini, so even if that would work for the 'default user' other will still be broken.
I just wanted to note that I know from some users using
-Dfile.encoding=...
in their eclipse ini, so even if that would work for the 'default user' other will still be broken.
This all is a nightmare between compatibility to old non-standards, new non-standards, existing workspaces with no explicit encoding set and weird workbench/resources interdependencies.
@tjwatson : WDYT about a new eclipse command line argument -defaultWorkspaceEncoding
?
That would replace -Dfile.encoding=...
for users who want specify default value for the workspace encoding for new workspaces and for workspaces without explicit encoding set.
We would then ignore -Dfile.encoding=...
completely.
-Dfile.encoding=...
property.-Dfile.encoding=...
set to some value would get native encoding instead, silent data corruption possible.Alternative: we have to "pimp" both ResourcesPlugin and WorkbenchEncoding classes to "understand" this -Dfile.encoding=...
by reading JVM command line arguments (and not the system property) like here: https://github.com/eclipse-platform/eclipse.platform.resources/blob/d914d915aec0deae0d6d8fd504c0f2aa541992e0/bundles/org.eclipse.core.resources/src/org/eclipse/core/internal/resources/Workspace.java#L2290
Pros: no "silent" data corruption for those using -Dfile.encoding=...
property.
Cons: ugly code written to support non standard property obsoleted in Java 18...
Changes for the alternative (compatible) proposal would look like:
Just another alternative: We switch to always use UTF-8 as java does, and add a warning popup for workspace without an encoding asking the user what to set from now on for this workspace?
add a warning popup for workspace without an encoding asking the user what to set from now on for this workspace?
Interesting idea. Instead a warning dialog after the fact, one can also ask to confirm / change the decision with the help of similar dialog:
Something like: Your workspace haven't explicitly specified file encoding, so "XYZ native one" was used. Do you want to continue using "XYZ native one"?
The problem with "we switch to UTF-8" is that users may not realize that they data will be corrupted from that moment on. So we have to propose native encoding OR the one specified via -Dfile.encoding=...
command line anyway, because that was the encoding with which the data was saved.
I see a problem here with all the "custom" products based on platform, where "workspace" doesn't mean anything or mean something completely different and they will now get a completely unrelated / strange popup saying something about "workspace", "encoding", bla bla.
So if we go this way, we probably should add this in the IDEApplication, but in that case those products that are not using IDEApplication and care about workspace encoding, will miss the warning. But I think that's OK considered that every product owner is supposed to read N&N before product update to the new platform version. At least responsible products owners do that.
Note: I will be mostly offline for a week + 1 day from now on, so don't expect any feedback from me in that time.
So if we go this way, we probably should add this in the IDEApplication
I think the "Workspace selection dialog" would be a good place for this, one could even think about a system property to disable this (e.g. if a -DeclipseEncoding=...
is set then always use that), ein think it should the has three choices where the first is the default selected:
One might even detect if a file is stored in another encoding (I once wrote a charset detector class for that) and ask the user to convert that file if it is different.
When running Eclipse 4.24 with Java 18 (tested with JustJ 18.0.1 and Temurin-18.0.1+10 on Windows 10), the encoding of a workspace that was created with Eclipse 4.23 (with the default encoding Cp1252) becomes UTF-8. The same does not happen when running Eclipse 4.24 with Java 17.
Steps to reproduce:
→ preferences General > Workspace: Text file encoding: Default (Cp1252)
→ Project > Properties: Resource: Text file encoding: Inherited from container (Cp1252)
Äpp
→ right-click
Äpp.java
: Properties: Resource: Text file encoding: Default (inherited from container: Cp1252)→ preferences General > Workspace: Text file encoding: Default (UTF-8)
→ Project > Properties: Resource: Text file encoding: Inherited from container (UTF-8)
→ right-click
Äpp.java
: Properties: Resource: Text file encoding: Default (inherited from container: UTF-8)→ "Project 'Hello Wörld' has no explicit encoding set" warning
→ "Syntax error on token "Invalid Character", delete this token" error in
Äpp.java
for the first character of the class nameSee also https://github.com/eclipse-platform/eclipse.platform/issues/86