eclipse-platform / eclipse.platform

https://eclipse.dev/eclipse/
Eclipse Public License 2.0
83 stars 113 forks source link

Migrate the Windows native file-system refresh monitor to JNA #1394

Closed HannesWell closed 5 months ago

HannesWell commented 6 months ago

This avoids the native code necessary for JNI, while being only slightly slower. At the same time it allows to use the 'Win32RefreshProvider' on all CPU architectures on Windows. This is an alternative to https://github.com/eclipse-platform/eclipse.platform/pull/1350, avoiding the need to recompile the native binaries for Windows on aarch64. Part of https://github.com/eclipse-platform/eclipse.platform.releng.aggregator/issues/577

In order to benchmark this I created the following JMH benchmark that measures the runtime of a simple sequence of method calls that creates a monitor handle, asks for changes and then disposes the handle again. Testing the methods in isolation would be more difficult, but if one has suggestions I can benchmark other scenarios as well.

`Win32MonitorBenchmark` source code ``` @BenchmarkMode(Mode.AverageTime) @OutputTimeUnit(TimeUnit.MICROSECONDS) @Measurement(iterations = 4) @Warmup(iterations = 3, time = 5) public class Win32MonitorBenchmark { public static final String PATH = "C:\\some\\path\\to\\not-empty-folder"; public static final int FILTER = WinNT.FILE_NOTIFY_CHANGE_FILE_NAME | WinNT.FILE_NOTIFY_CHANGE_DIR_NAME | WinNT.FILE_NOTIFY_CHANGE_LAST_WRITE | WinNT.FILE_NOTIFY_CHANGE_SIZE; @Benchmark public void testNativeResourceAccess() { long handle = Win32Natives.FindFirstChangeNotificationW(PATH, true, FILTER); Win32Natives.FindNextChangeNotification(handle); Win32Natives.FindCloseChangeNotification(handle); } @Benchmark public void testJNAResourceAccess() { long handle = Win32NativesNew.FindFirstChangeNotificationW(new WString(PATH), 1, FILTER); Win32NativesNew.FindNextChangeNotification(handle); Win32NativesNew.FindCloseChangeNotification(handle); } @Benchmark public int testNativeLastError() { return Win32Natives.GetLastError(); } @Benchmark public int testJNALastError() { return Win32NativesNew.GetLastError(); } } ```

The Win32Natives replicates the existing JNI based implementation of the Win32Natives and uses the existing dll, the class Win32NativesNew replicates the new JNA based implementation.

The results are

Benchmark                                       Mode  Cnt   Score    Error  Units
Win32MonitorBenchmark.testJNALastError          avgt   20   0,007 ±  0,001  us/op
Win32MonitorBenchmark.testJNAResourceAccess     avgt   20  17,465 ±  0,067  us/op
Win32MonitorBenchmark.testNativeLastError       avgt   20   0,005 ±  0,001  us/op
Win32MonitorBenchmark.testNativeResourceAccess  avgt   20  16,026 ±  0,149  us/op

In these scenarios the JNA based implementation is overall less then 10% slower.

From the four remaining relevant native methods

the method FindNextChangeNotification is the only one that could be performance sensitive. FindFirstChangeNotificationW and FindCloseChangeNotification are only called once if a project is opened respectively closed. And one cannot wait faster for events user initiates WaitForMultipleObjects is also not very sensitive. But looking at the absolute numbers, I think in practice other factors will have a much larger impact and a 10% slow down at such low level will hardly be measured in real-life applications or will be noticed. With that I think the performance changes due to this change are acceptable.

@jukzi since you are often interested in improving the performance, what do you think? @merks since you are also working a lot on Windows, maybe this is also interesting for you. Of course everyone else is also invited to discuss this.

github-actions[bot] commented 6 months ago

Test Results

 1 734 files  ±0   1 734 suites  ±0   1h 22m 54s :stopwatch: - 9m 17s  3 967 tests ±0   3 945 :white_check_mark: ±0   22 :zzz: ±0  0 :x: ±0  12 498 runs  ±0  12 337 :white_check_mark: ±0  161 :zzz: ±0  0 :x: ±0 

Results for commit 9daf7333. ± Comparison against base commit 25b48e39.

:recycle: This comment has been updated with latest results.

jukzi commented 6 months ago

@jukzi since you are often interested in improving the performance, what do you think?

Thanks for asking. We have disabled refresh monitor in all our workspaces by default, so that i do not need to care about this particular feature. In general i think that replacing native code with java code would eclipse easier to maintain. But com.sun.jna is not standard java. However jna is already used in eclipse already anyway.

less then 10% slower.

i.e. it is slower? 10% may be some seconds on large workspaces.

I don not get the benefit of this change - it's not standard java, it is slower. So it has cons. Where are the pros that outweigh them? aarch64 seems to be a insufficient argument because almost nobody uses it.

jukzi commented 6 months ago

Wouldn't it be better to wait for java 22 Foreign Function support? https://openjdk.org/jeps/454 That claims to have no performance drawback.

merks commented 6 months ago

FYI, Java 25 LTS is scheduled for September 2025, so following the current practice of upgrading to a new Java LTS release with about 3-6 months after GA, that has us waiting until into 2026...

jukzi commented 6 months ago

2026 doesn't sound bad - or is there any need to rush optimizations for the 0% (rounded) user of the IDE that use aarch64?

HannesWell commented 6 months ago

In general i think that replacing native code with java code would eclipse easier to maintain. But com.sun.jna is not standard java. However jna is already used in eclipse already anyway.

I would consider JNA as a ordinary third-party dependency and I don't that it's usage would in general be discouraged? Of course there are alternatives with other pro's and cons (JNI - faster but more cumbersome, FFM - not yet available).

less then 10% slower.

i.e. it is slower? 10% may be some seconds on large workspaces.

Relatively the composite benchmark case is 10% with JNA compared to JNI, but if you look at the absolute numbers you can see that the difference is about 1.5µs. This means the three actions, adding a watcher, obtaining a change notification and closing the watcher have to run more than 600.000 times (roughly rounded to smaller numbers) to sum up to one second delay. And since AFICT there is only one watcher added per project and the current implementation refreshes the entire project if any single or bulk change is detected, this means you have to create more than 600.000 projects, have a change in them performed externally and have to close them again and all of that combined to get a slow down of a second.

I assume nobody will ever face that scenario, but if it would be possible I such operations would take so much longer that the one second would hardly be measurable in the other noise and would not be noticed by a human at all. With that I think the slowdown is practically neglect-able.

I don not get the benefit of this change - it's not standard java, it is slower. So it has cons. Where are the pros that outweigh them? aarch64 seems to be a insufficient argument because almost nobody uses it.

2026 doesn't sound bad - or is there any need to rush optimizations for the 0% (rounded) user of the IDE that use aarch64?

This is not an optimization for aarch64 on Windows, it implements a feature that is currently not supported with that architecture in a way that also tries to reduce the maintenance effort in general and the effort to support the new architectures (probably there are non soon). Two birds with one stone. And of course there are currently almost zero users of Eclipse on Windows on Arm because Eclipse did not support it yet (not sure if and how well the x86_64 emulator works). I can't predict the future, but looking at the Mac eco-system where at least AFAIK ARM seems to take over in the mobile sector, I assume there is at least a chance that ARM will also become more important for Windows.

HannesWell commented 5 months ago

How should we proceed here? Since we are in the very beginning of a new development cycle my suggestion is to submit this as it is (after rebasing and resolving conflicts). If it turns out that this introduces noticeable performance regressions I can still provide a more differentiated approach (with code using JNI and code using JNI).

HannesWell commented 5 months ago

Please do not submit anything that is known to be slower without a need. Either proof with a reproducible measurement that the regression is insignificant, i.e. < 0.1%

Can you please explain where this number is coming from? Was there some kind of agreement about it or is this number general practice? Furthermore was the reference this relation is applied to? If there is such a specific number I would expect a specific scenario to test. Furthermore I wonder how strict this threshold is and how other factory can out-weight it, since there are many other things that can influence the overall quality of software.

in a real world use case of a workspace with 10_000 folders and a million files that are updated or do not touch the windows specific code but add another codepath for arm. Just submitting something that is known to be slower and leaving it to others to find the regression is inappropriate. Relying on guesses is not enough.

Not sure if a million files are really a real world scenario. I would say the entire Eclipse SDK with all its four sub-projects and even more git sub-modules is pretty large and it only has 115.435 files and 30.264 folders (if the Windows Explorer didn't count it wrong). A project almost nine time the size of it will probably have some challenges. Furthermore I didn't just guess. As written in https://github.com/eclipse-platform/eclipse.platform/pull/1394#issuecomment-2136256007 I measured the runtime of a specific part of the code and then analyzed how it is called and from that derived that the increased runtime I measured has a high chance to be not relevant in a real world scenario. Of course I, like everyone else can be wrong. That's why I wrote: If it turns out that this introduces noticeable performance regressions [contrary to my expectations] I can still provide a more differentiated approach (with code using JNI and code using JNI).

If I understood your suggestion for a benchmark correctly, you suggest to have one project with 10_000 folders where a millions files are added? Therefore I'm not sure that you have read my analysis in https://github.com/eclipse-platform/eclipse.platform/pull/1394#issuecomment-2136256007 in detail? Just looking at the numbers is not sufficient. Without context and analysis every benchmark is worthless and it can be wrong in any direction.

Monitors are just added to project root folders and linked resources: https://github.com/eclipse-platform/eclipse.platform/blob/master/resources/bundles/org.eclipse.core.resources/src/org/eclipse/core/internal/refresh/MonitorManager.java#L101-L120

Therefore to really challenge the new implementation one would have to create many projects that are empty and that all get one file added. Adding more files to the same project would just increase the overall runtime while there is still only one call to the native methods and thus its ratio would be even smaller. For that scenario I have no written the following basic test-case (executable as JUnit Plug-in test): It creates a defined number of projects, opens them, adds a file to each one and waits the resource notification to be dispatched and eventually closes all projects again.

RefreshPerformanceTest code ``` package org.eclipse.core.tests.resources.refresh; import java.nio.file.Files; import java.nio.file.Path; import java.util.ArrayList; import java.util.List; import java.util.Set; import java.util.concurrent.ConcurrentHashMap; import java.util.concurrent.CountDownLatch; import java.util.concurrent.atomic.AtomicInteger; import org.eclipse.core.resources.IProject; import org.eclipse.core.resources.IResourceDelta; import org.eclipse.core.resources.IWorkspaceRoot; import org.eclipse.core.resources.ResourcesPlugin; import org.eclipse.core.runtime.CoreException; import org.eclipse.core.runtime.IPath; import org.eclipse.core.runtime.preferences.IEclipsePreferences; import org.eclipse.core.runtime.preferences.InstanceScope; import org.junit.jupiter.api.BeforeAll; import org.junit.jupiter.api.Test; public class RefreshPerformanceTest { @BeforeAll static void setup() throws CoreException { IEclipsePreferences node = InstanceScope.INSTANCE.getNode(ResourcesPlugin.PI_RESOURCES); node.putBoolean(ResourcesPlugin.PREF_AUTO_REFRESH, true); for (IProject project : ResourcesPlugin.getWorkspace().getRoot().getProjects()) { project.delete(true, true, null); } } @Test void testListenerInstallationAndRemoval() throws Exception { int runs = 10; IWorkspaceRoot root = ResourcesPlugin.getWorkspace().getRoot(); List projects = new ArrayList<>(runs); long start = System.currentTimeMillis(); for (int i = 0; i < runs; i++) { IProject project = root.getProject("p" + i); project.create(null); projects.add(project); } System.out.println("Runtime create : " + (System.currentTimeMillis() - start) + "ms"); start = System.currentTimeMillis(); for (IProject project : projects) { project.open(null); } System.out.println("Runtime open : " + (System.currentTimeMillis() - start) + "ms"); AtomicInteger updatedProjects = new AtomicInteger(); Set changedProjects = ConcurrentHashMap.newKeySet(); CountDownLatch allChangesProcessed = new CountDownLatch(runs); ResourcesPlugin.getWorkspace().addResourceChangeListener(e -> { IResourceDelta delta = e.getDelta(); if (delta != null) { for (IResourceDelta child : delta.getAffectedChildren()) { if (changedProjects.add(child.getFullPath())) { allChangesProcessed.countDown(); } updatedProjects.incrementAndGet(); } } }); List files = projects.stream().map(p -> p.getLocation().toPath().resolve("text.txt")).toList(); start = System.currentTimeMillis(); for (Path file : files) { Files.createFile(file); } allChangesProcessed.await(); System.out.println("Runtime file creation and processing : " + (System.currentTimeMillis() - start) + "ms"); System.out.println("Received updates:" + updatedProjects.get()); start = System.currentTimeMillis(); for (IProject project : projects) { project.delete(true, true, null); } System.out.println("Runtime delete : " + (System.currentTimeMillis() - start) + "ms"); } } ```

For run=10 and consequently 10 projects being created changed and closed these are the runtime measured for one invocation of the case with the old JNI based implementation:

10 projects
Runtime create : 60ms
Runtime open : 62ms
Runtime file creation and processing : 30455ms
Received updates:10
Runtime delete : 72ms

and the runtime with the new one based on JNA:

10 projects
Runtime create : 63ms
Runtime open : 145ms
Runtime file creation and processing : 30572ms
Received updates:10
Runtime delete : 70ms

One immediately notices that creating and awaiting the changes takes much longer than all other parts and much longer than expected only from the operations. When looking at https://github.com/eclipse-platform/eclipse.platform/blob/309a7f0d29fa5e5cd5efb88ad020fea721262798/resources/bundles/org.eclipse.core.resources/src/org/eclipse/core/internal/resources/refresh/win32/Win32Monitor.java#L557 it is clear that the processing of changes found by native hooks is throttled after the current changes in all monitor directories are processed. Since RESCHEDULE_DELAY has a value of 3000 the ten projects match the runtime of that part of 30,000s quite well and I think one can assume that the throttling happens after each change in a project in this case.

Consequently there is no point in aiming for the last µs in the implementation that awaits the change notifications.

But even with the throttling of changes active we can still benchmark the impact of the slower JNA implementation on installing and removing the listener on project opening and closing. For that I simply commented out the middle part that creates the files in the project and waits a change notification. Since the case now runs much faster I changed the number of process projects to 1000. When executing the case once for the JNI- and once for the JNA-based implementation I measured the following runtimes:

JNI:
1000 projects (without changes)
Runtime create : 2570ms
Runtime open : 6562ms
Runtime delete : 56039ms

JNA:
1000 projects (without changes)
Runtime create : 2554ms
Runtime open : 6598ms
Runtime delete : 48468ms

Of course the runtime of such benchmark cases varies from run to run and one has to run them multiple times to get smaller Confidence intervals (e.g. I don't expect this change to generally speed up deleting projects by ~15%). Nevertheless this helps to put the numbers of the JMH-benchmark I presented initially into perspective.

Bear in mind that the native method FindFirstChangeNotificationW is called once when a project is opened, the method FindNextChangeNotification() is called after changes have been awaited through WaitForMultipleObjects() and the latter method has signaled a change. On project close FindCloseChangeNotification() is called once. In the JMH-benchmark I measured that calling FindFirstChangeNotificationW, FindNextChangeNotification() and FindCloseChangeNotification() subsequently together takes 1.5µs longer with the JNA based implementation compared to the JNI based one. I cannot say how much longer each operation runs longer in isolation. I assume the portion of FindFirstChangeNotificationW is slightly higher than a third because it has three parameters unlike the other methods that have two, but it probably does not take the majority of the extra time.

But to put this into perspective: Even if FindFirstChangeNotificationW alone would run 1.5µs longer it means that 1000 calls would take 1.5ms longer. Since we saw that opening 1000 projects takes about ~6500s the assumed extra 1.5ms were only 0.023% of the time it takes to open a project. For the delete-case the ratio is even smaller.

With that I think it is clear that, as expected, that this change is less than an insignificant in terms of performance.

jukzi commented 5 months ago

Can you please explain where this number is coming from?

Just a damn big number that reflects the workspace size that i am working for. The concrete number doesn't matter, just to show you how big our workspace is and that people can realy suffer from small changes like this. Since you promised it is not measurably slower, I gave this PR a chance and verified it myself: Setup: 0. settings: image

  1. create a plain project (no java) and used an existing big file structure as root for that project. 2. open that project in the package explorer 3. on the bash execute find . -type f -exec touch {} + for a big subfolder 4. sampling the Total Time spend in org.eclipse.core.internal.resources.refresh.win32.Win32Monitor$Handle.postRefreshRequest () with VisualVm

With this PR i reproducible get almost double time compared to master! It's even visible in the Progress Bar that it somehow does the refresh Job twice: image which it did not before: image

There is no throttling happening, the CPU is fully used in that thread.

HannesWell commented 5 months ago

First of all please read the second paragraph in https://github.com/eclipse-platform/eclipse.platform/pull/1394#issuecomment-2136256007. In your benchmark you basically just measure the runtime of refreshing a project that contains large amount of files and folders. The change detection on OS level that involves the code changed with this PR is only called once, maybe a few times more depending on how long the script touching files runs.

With this PR i reproducible get almost double time compared to master! It's even visible in the Progress Bar that it somehow does the refresh Job twice:

That's because you still have the org.eclipse.core.resources.win32.x86_64 from the TP in your runtime and that registers the Win32RefreshProvider a second time. You have to make sure it isn't included in that launch.

There is no throttling happening, the CPU is fully used in that thread.

Throttling happens only between the calls that ask the OS if there are new changes within the entire project directory structure. Because you create only one big change in one project the OS is only queried once for changes within the project directory (as written above maybe a few times more depending on how long it takes to touch all files). Everything else you see is the runtime of refreshing the entire project. Just like if you had trigger a refresh manually.

jukzi commented 5 months ago

ok, removing org.eclipse.core.resources.win32.x86_64 from launch did help. But i am curious, why its get automatically added again when pressing "Select Required". Is it one of this left-over references? image

merks commented 5 months ago

Maybe something is being smart and figures out to require every available (and applicable via the filters) fragment for a given host.

HannesWell commented 5 months ago

ok, removing org.eclipse.core.resources.win32.x86_64 from launch did help.

Great. I made a few final clean-ups and checks and will submit it as soon as the build is green.

But i am curious, why its get automatically added again when pressing "Select Required".

Select Required adds all fragments of a host-bundle to a launch config (as Ed assumed), the reason is that it used to be like that. No clue why the original author did that. But in general instead of clicking Select Required I recommend to check Include required (Features and) Plug-ins automatically while launching. As the name indicates it adds all requirements automatically when launching. This way your config is always up-to date and can be much smaller since you only need to add the dependency-roots, i.e. the Features/Plug-ins you really want. And this does not add fragments of host bundles. If you want to have a fragment in your launch, just add it explicitly (its host can be added automatically then).

Is it one of this left-over references? image

Yes these references are removed via https://github.com/eclipse-platform/eclipse.platform.releng.aggregator/pull/2127. The old release notes don't have to be adjusted. Are the JDT launch configs generally up to date? From my experience at least the ones in platform are often not and if its the same for JDT I would just leave them as they are until they are either cleaned-up or deleted.