eclipse-platform / eclipse.platform.resources

Eclipse Public License 2.0
3 stars 18 forks source link

Default (implicit) workspace encoding becomes UTF-8 when running Eclipse on Java 18 #154

Closed howlger closed 2 years ago

howlger commented 2 years ago

When running Eclipse 4.24 with Java 18 (tested with JustJ 18.0.1 and Temurin-18.0.1+10 on Windows 10), the encoding of a workspace that was created with Eclipse 4.23 (with the default encoding Cp1252) becomes UTF-8. The same does not happen when running Eclipse 4.24 with Java 17.

Steps to reproduce:

  1. With Eclipse IDE for Java Developers 2022-03:
    1. Create a new workspace
      → preferences General > Workspace: Text file encoding: Default (Cp1252)
    2. Inside the new workspace create a new project Hello Wörld
      Project > Properties: Resource: Text file encoding: Inherited from container (Cp1252)
    3. Inside the new project create a new class Äpp
      → right-click Äpp.java: Properties: Resource: Text file encoding: Default (inherited from container: Cp1252)
  2. Open the workspace with Eclipse IDE for Java Developers 2022-06 with an installed JustJ Adoptium OpenJDK Hotspot JRE Complete 18.0.1:
    → preferences General > Workspace: Text file encoding: Default (UTF-8)
    Project > Properties: Resource: Text file encoding: Inherited from container (UTF-8)
    → right-click Äpp.java: Properties: Resource: Text file encoding: Default (inherited from container: UTF-8)
    "Project 'Hello Wörld' has no explicit encoding set" warning
    "Syntax error on token "Invalid Character", delete this token" error in Äpp.java for the first character of the class name

See also https://github.com/eclipse-platform/eclipse.platform/issues/86

merks commented 2 years ago

I think this is to be expected:

https://openjdk.java.net/jeps/400

iloveeclipse commented 2 years ago

I think this is to be expected:

https://openjdk.java.net/jeps/400

But we should have a fix for that, so that we properly determine native charset, and the not explicitly set encoding stays the system one, not UTF8.

The whole point of changes we did via https://bugs.eclipse.org/bugs/show_bug.cgi?id=516583 and linked bugs was to make sure workspaces and projects have explicit encoding specified, to avoid unattended encoding changes resulting in data corruption. We (me) just missed the point that native charset detection has to change too because of breaking JEP-400 changes.

howlger commented 2 years ago

As @merks already pointed out, this is caused by JEP 400, so the following can be used as workaround.

Workaround when running Eclipse 2022-06 (4.24) with Java 18: In eclipse.ini add the following line after the line -vmargs:

-Dfile.encoding=COMPAT
howlger commented 2 years ago

To get the native encoding, the system property file.encoding can be used in Java 17 or lower:

String nativeEncoding = System.getProperty("file.encoding");

To get the native encoding in Java 18 as well as in older Java versions, something like the following has to be used instead:

String nativeEncodingSinceJava17 = System.getProperty("native.encoding");
String nativeEncoding = nativeEncodingSinceJava17 == null ? System.getProperty("file.encoding") : nativeEncodingSinceJava17;
merks commented 2 years ago

I could add this -Dfile.encoding=COMPAT to the JRE-specific tasks for the 18.x JRE in the catalog. I suppose old VMs would not like this option, right?

iloveeclipse commented 2 years ago

I could add this -Dfile.encoding=COMPAT to the JRE-specific tasks for the 18.x JRE in the catalog. I suppose old VMs would not like this option, right?

Yes, that would work around the issue for all Eclipses we shipped up to now, and yes, that would needed for Java 18+ only.

Bananeweizen commented 2 years ago

I could add this -Dfile.encoding=COMPAT to the JRE-specific tasks for the 18.x JRE in the catalog. I suppose old VMs would not like this option, right?

But wouldn't that become a problem on its own? I understand your suggestion as "Oomph packaged Eclipse installations remain with the Cp1252 encoding on Windows for eternity", which would make a default installation quite surprising for new developers sometime in the future (since it would not follow the then established UTF-8 everywhere principle).

And if it is not for eternity, then we only defer the issue and struggle with the same problems again at some point in the future. That's why I would rather be bold and go with the new encoding. If people didn't care about setting an encoding until now, some of them will be confused by either choice that we take. That's enough reason for me to take the more simple route. :)

iloveeclipse commented 2 years ago

Or we leave installer with java 17 till we have a fix here (I'm on it).

iloveeclipse commented 2 years ago

I've pushed API proposal that we need to fix this issue, see https://github.com/eclipse-platform/eclipse.platform.runtime/pull/63, and two other PR's in platform UI / resources to accommodate this change.

The one is the one that actually determines the workspace default encoding (!), not the https://github.com/eclipse-platform/eclipse.platform.resources/pull/156, which is crazy enough, because IDE preference page that defines workspace defaults uses workbench API that has no dependencies to resources bundle itself. OMG. But now they both will use same API.

Would be good if interested committers could review that.

laeubi commented 2 years ago

I just wanted to note that I know from some users using -Dfile.encoding=... in their eclipse ini, so even if that would work for the 'default user' other will still be broken.

iloveeclipse commented 2 years ago

I just wanted to note that I know from some users using -Dfile.encoding=... in their eclipse ini, so even if that would work for the 'default user' other will still be broken.

This all is a nightmare between compatibility to old non-standards, new non-standards, existing workspaces with no explicit encoding set and weird workbench/resources interdependencies.

@tjwatson : WDYT about a new eclipse command line argument -defaultWorkspaceEncoding ? That would replace -Dfile.encoding=... for users who want specify default value for the workspace encoding for new workspaces and for workspaces without explicit encoding set.

We would then ignore -Dfile.encoding=... completely.

Alternative: we have to "pimp" both ResourcesPlugin and WorkbenchEncoding classes to "understand" this -Dfile.encoding=... by reading JVM command line arguments (and not the system property) like here: https://github.com/eclipse-platform/eclipse.platform.resources/blob/d914d915aec0deae0d6d8fd504c0f2aa541992e0/bundles/org.eclipse.core.resources/src/org/eclipse/core/internal/resources/Workspace.java#L2290

Pros: no "silent" data corruption for those using -Dfile.encoding=... property. Cons: ugly code written to support non standard property obsoleted in Java 18...

Changes for the alternative (compatible) proposal would look like:

laeubi commented 2 years ago

Just another alternative: We switch to always use UTF-8 as java does, and add a warning popup for workspace without an encoding asking the user what to set from now on for this workspace?

iloveeclipse commented 2 years ago

add a warning popup for workspace without an encoding asking the user what to set from now on for this workspace?

Interesting idea. Instead a warning dialog after the fact, one can also ask to confirm / change the decision with the help of similar dialog: image

Something like: Your workspace haven't explicitly specified file encoding, so "XYZ native one" was used. Do you want to continue using "XYZ native one"?

The problem with "we switch to UTF-8" is that users may not realize that they data will be corrupted from that moment on. So we have to propose native encoding OR the one specified via -Dfile.encoding=... command line anyway, because that was the encoding with which the data was saved.

I see a problem here with all the "custom" products based on platform, where "workspace" doesn't mean anything or mean something completely different and they will now get a completely unrelated / strange popup saying something about "workspace", "encoding", bla bla.

So if we go this way, we probably should add this in the IDEApplication, but in that case those products that are not using IDEApplication and care about workspace encoding, will miss the warning. But I think that's OK considered that every product owner is supposed to read N&N before product update to the new platform version. At least responsible products owners do that.

iloveeclipse commented 2 years ago

Note: I will be mostly offline for a week + 1 day from now on, so don't expect any feedback from me in that time.

laeubi commented 2 years ago

So if we go this way, we probably should add this in the IDEApplication

I think the "Workspace selection dialog" would be a good place for this, one could even think about a system property to disable this (e.g. if a -DeclipseEncoding=... is set then always use that), ein think it should the has three choices where the first is the default selected:

  1. Detected System Encoding
  2. UTF-8 (recommended for interoperability)
  3. Other

One might even detect if a file is stored in another encoding (I once wrote a charset detector class for that) and ask the user to convert that file if it is different.