Discards events on Mac - Githubissues

scheglov commented 3 years ago

See https://github.com/Dart-Code/Dart-Code/issues/3438#issuecomment-876807496 It looks that the watcher simply ignores events that happen before _listDir().

jakemac53 commented 3 years ago

@scheglov can you reproduce https://github.com/dart-lang/sdk/issues/14373 on your mac? If not we could possibly remove this bogus 200ms delay that is there on Mac. The issue does claim to be resolved.

scheglov commented 3 years ago

It looks that the dart:io issue is not solved, I reopened it.

devoncarew commented 3 years ago

I wonder if we should remove the initial 200ms of discarding events even w/o the dart:io issue fixed? It seems like occasionally receiving events that happened before you subscribed is a much less severe issue than suppressing the first n events.

The 200ms delay is likely the cause of hard to repro bugs we've had reported against the analysis server where our analysis reports are not correct after doing something like switching git branches - which might make us re-calculate our watcher subscriptions. People then resolve the issue by restarting the analysis server.

aam commented 3 years ago

Does the analysis server make use of isReady property of the Watcher https://pub.dev/documentation/watcher/latest/watcher/Watcher/isReady.html? Would it be possible to wait for the watcher to get ready(finish sleeping for 200ms) before directory update process is started?

DanTup commented 3 years ago

@aam for the original issue in Dart-Code above, the actual file modifications are external (by Flutter) eg.:

user runs pub get which modifies package_config.json that triggers the analysis server to create a new context (which includes a new watcher)
Flutter injects flutter_gen into that file during those 200ms which doesn't produce an event so the server ends up using the package_config.json without the flutter_gen entry

So I don't think it could be delayed, however perhaps something like this could work:

server creates the new watcher
server waits for isReady
server builds the context (eg. it reads nothing from disk for the new context prior to the 200ms completing)

However this would require the code be async (I don't think it currently is, nor am I sure how trivial a change that is - @scheglov may be best placed to answer that).

I wonder if there's also value in a way to opt-out of this delay though? In some cases it might be preferrable to have additional events that happens in the past and not lose future ones than the current behaviour (for example if you're just maintaining the latest versions of the files in memory, re-reading a file that hadn't actually changed seems better than missing a file that did).

aam commented 3 years ago

thanks @DanTup . If two events (invocation of pub get and injection of flutter_gen) are asynchronized then this setup seems to be intrinsically racy as there are no guarantees from the os regarding how long the watcher installation will take. 200ms delay perhaps makes it more obvious though.

DanTup commented 3 years ago

@aam if I understand correctly, as long as the watcher fired the event that occurred it was created, we should be good if we read the contents of the file also after creating the watcher:

(pub): modifies package_config
(analysis server): watcher fires, causes analysis server to recreate the contexts
(analysis server): set up new watcher
(analysis server): read the package_config from disk
(flutter) injects flutter_gen
(analysis server): watcher fires event which causes us to go back to step 1 (which will re-read the file)

It's possible here we'd get a false-trigger (eg. we're rebuild the context an additional time), but that's preferable to missing the event as it would still result in a consistent/correct view. Missing the event leaves things in an inconsistent state (what's in memory does not match what's on disk).

aam commented 3 years ago

Is step 5 (flutter) injects flutter_gen synchronized with steps 2-4? If yes, then it should not be a problem if step 3 takes additional 200ms, if no, then that looks like a race as step 5 might finish before step 4. Also, do you know why new watcher is created when package_config is modified? Basically setting up of a new watcher has some latency, is potential source of dropped events.

DanTup commented 3 years ago

@aam the Flutter part is out of process so can happen at any point, but I believe as long as step 4 starts after step 3 completes (eg. the watcher would raise any events starting before step 4 reads the file), then:

if Flutter modifies the file before step 4, step 4 would read the new contents
if Flutter modifies the file after step 3, the watcher would fire an event that causes us to restart the process (and the next time around we'd read the new contents)

do you know why new watcher is created when package_config is modified? Basically setting up of a new watcher has some latency, is potential source of dropped events.

I guess in the case above reusing the watcher might be an option, however in some cases this could actually be the first time the watcher is being created (eg. at startup, or if a new package is being created/checked out) so that mightnot solve all the possible races.

aam commented 3 years ago

aha, makes sense @DanTup . But then if step 4 only starts after step 3 completes it should not matter that step 3 took additional 200ms to complete, should it?

I guess in the case above reusing the watcher might be an option, however in some cases t

I'm curious what was the motivation to actually keep dropping/recreating watchers, which sounds like more work.

scheglov commented 3 years ago

The problem is that step 3 "(analysis server): set up new watcher" is "complete" from the analysis server perspective when it created the watcher. But actually the watcher is ready only after some time - listing files + 200 ms?

We recreate watchers when we recreate analysis contexts because it allows us to handle uniformly two situations - when configuration is changed but analysis roots (directories for which we create analysis contexts) are the same, and also when configuration roots are changed.

jakemac53 commented 3 years ago

I wonder if we should remove the initial 200ms of discarding events even w/o the dart:io issue fixed? It seems like occasionally receiving events that happened before you subscribed is a much less severe issue than suppressing the first n events.

The problem I believe is that the events you receive can be in any arbitrary order, so you don't actually know the correct state of the world (the events could be delete -> create when they should be create -> delete, for instance).

aam commented 3 years ago

The problem is that step 3 "(analysis server): set up new watcher" is "complete" from the analysis server perspective when it created the watcher. But actually the watcher is ready only after some time - listing files + 200 ms?

watcher has ready Future property that is completed when watcher's initialization(including 200ms delay) is competed. Would it make sense for the analysis server to wait for that before continuing?

DanTup commented 3 years ago

Would it make sense for the analysis server to wait for that before continuing?

That was the idea I had in https://github.com/dart-lang/watcher/issues/115#issuecomment-881272295, but it would require making this code async and I'm not sure how simple that would be (not particularly changing the code, but the implications of other events or requests from the client being picked up during that period). @scheglov may have a better idea of how feasible that is.

devoncarew commented 3 years ago

I suspect that we can't make the code async, but it might be possible to add a listener for ready, and to fire a change event / validate our world view when that occurs.

DanTup commented 3 years ago

I suspect that we can't make the code async, but it might be possible to add a listener for ready, and to fire a change event / validate our world view when that occurs.

There's some existing code to try and handle this for pubspec (https://github.com/dart-lang/sdk/blob/ccb3943e48524a993b6ca6e91cc39dabd360797f/pkg/analyzer/lib/src/workspace/pub.dart#L44). If we extend that to also include the package_config.json and trigger it from isReady (it would need to be context-specific), I think that could resolve https://github.com/Dart-Code/Dart-Code/issues/3438 - although to eliminate all the issues (eg. missed updates when changing Git branches) we'd have to check all files (or at least their timestamps) which could be a bigger perf hit.

I'll have a go at the package_config fix at least.

DanTup commented 2 years ago

While updating the analysis server to use watcher.ready to try and resolve these races, I found an odd issue where the VM seems to terminate silently if the directory does not exist and you try to await it's ready future. I filed details at https://github.com/dart-lang/sdk/issues/47967 (silently terminating/crashing feels like a VM issue).

DanTup commented 2 years ago

Turns out my issue above was invalid - I had assumed that Future-returning main functions would always be waited for before the VM terminates and that's not the case.

So I think the issue here may be a package:watcher issue with ready never firing (which is causing us to stall in the server when trying to wait for our watchers to be set up if any of the folders do not exist). Here's a repro:

import 'package:watcher/watcher.dart';

main() async {
  // Ensure app doesn't quit for at least 30s
  Future.delayed(const Duration(seconds: 10));

  // Try to watch a folder that does not exist
  final watcher = DirectoryWatcher('/tmp/foo');

  // Print any events/errors from the stream.
  final sub = watcher.events.listen(
    (event) => print(event.type),
    onError: (e) => print('ERROR ON STREAM: $e'),
  );

  // Wait for the watcher to become ready.
  print('waiting for ready...');
  try {
    await watcher.ready;
    print('watcher is ready!');
  } catch (e) {
    print('ERROR WAITING FOR READY: $e');
  }

  print('DONE!');
  sub.cancel();
}

waiting for ready...
ERROR ON STREAM: FileSystemException: Directory listing failed, path = '/tmp/foo/' (OS Error: No such file or directory, errno = 2)
Exited (this happens after 30s)

This script creates a DirectoryWatcher on a folder that does not exist and then awaits its ready event. Running the app shows that is stops at the await and never progresses (until after 30 seconds when the app just terminates without progressing further, because the timer created by the Future.delayed expires).

I can file a new issue if that makes sense, although it seems fairly related to the discussion above and if we can resolve it, can probably close this.

DanTup commented 2 years ago

@aam any thoughts on the comment above? From earlier comments my understanding was that we should use the ready Future on the watcher, but in the case where a directory does not exist (which in theory can happen even if we had just done an exists check), the Future never completes. Is that a bug?

jakemac53 commented 2 years ago

That does sound like a bug to me, we should complete ready with the error if there is one when setting up the watcher.

aam commented 2 years ago

That does sound like a bug to me, we should complete ready with the error if there is one when setting up the watcher.

+1

DanTup commented 2 years ago

Completing with an error would help, although is it not valid to create a watcher on a directory that doesn't exist yet and get events when it's created?

aam commented 2 years ago

Completing with an error would help, although is it not valid to create a watcher on a directory that doesn't exist yet and get events when it's created?

I would imagine you are expected to watch parent directory for children directories/files being created

jakemac53 commented 2 years ago

Ya I don't think this package should automatically create watchers for parent directories as that could come with a lot of hidden costs. It should be explicitly done by the user of the package.

DanTup commented 2 years ago

Oops, I worded that badly - I meant events for the children. I had incorrectly thought this already worked for the stream, and it was just ready that wasn't handling it, but I just tested and it's not - so I don't think that's an issue (changing the server to wait on ready isn't going to change the behaviour of that).

If nobody is likely to have time to fix the ready issue soon and it's likely not too complicated, I can take a look if someone can give me some pointers (I see there are quite a few watcher implementations, and I'm not sure if it'll need doing in each, or just in one place).

jakemac53 commented 2 years ago

If you have some time to look into the ready issue that would be great.

DanTup commented 2 years ago

@jakemac53 would a reasonable fix be to complete the _readyCompleted in emitError if it hasn't already been completed?

Alternatively, since the watcher is always closed after the error, maybe it could just be completed in close, although that means it would complete without the error?

I presume it may need doing in each watcher - are there PR bots that will run the tests on all the platforms? (save me setting up a Linux instance)

jakemac53 commented 2 years ago

@jakemac53 would a reasonable fix be to complete the _readyCompleted in emitError if it hasn't already been completed?

I think so, the error might come from a child directory and we could try to be smarter around those? But I don't think we have to.

We should also consider calling close() in this case.

DanTup commented 2 years ago

As far as I can see, emitError already calls close, so I think errors from a child directory would still close it anyway. I'll work on a PR, thanks!

dart-lang / watcher

Discards events on Mac #115