hexa00 commented 6 years ago

So that:

We don't read the file right after a save
We can deal with many changes resulting from an operation.

Current design

Currently the FileSystemWatcherServer provides this api:

   export interface FileSystemWatcherServer extends JsonRpcServer<FileSystemWatcherClient> {
    /**
     * Start file watching for the given param.
     * Resolve when watching is started.
     * Return a watcher id.
     */
    watchFileChanges(uri: string, options?: WatchOptions): Promise<number>;

    /**
     * Stop file watching for the given id.
     * Resolve when watching is stopped.
     */
    unwatchFileChanges(watcher: number): Promise<void>;
}

And has a client to receive notification with this api:

export interface FileSystemWatcherClient {
    /**
     * Notify when files under watched uris are changed.
     */
    onDidFilesChanged(event: DidFilesChangedParams): void;
}

Implementations details:

Number Of Processes:

And it is bound like so: bind(FileSystemWatcherServer).to(ReconnectingFileSystemWatcherServer);

This makes it so that anyone injecting a FileSystemWatcherServer creates a new Nsfw watcher process with its own set of watches and its own client.

Aggregation of events:

To avoid too many events to be generated events are aggregated over a 50ms period.

Watched files notifications:

These are shared across the same server, meaning that if a client on the same server does 2 different calls to watchFileChanges with a different set of files the client will be notified of all the changes. Forcing the client to check if events are relevant.

Ignore rules:

The watcher id is only used to apply ignore rules so that files are ignored by watcher. So if 2 watchers watch the same directory if one ignores .git and the other doesn't the watcher client will still get notified since the notifications are shared.

Problems with this approach:

1221 This creates multiples nsfw processes
1188 Showed that the timeout sometimes doesn't aggregate the events

as expected and can cause performance issues.
942 As discussed watched files notifications should not be shared per

server and be bound to a watchFileChanges call.

New design proposal:

The Watcher interface:

 interface Watcher implements Disposable {
   /* Notify of file changes */
   onChanged: Event<FileChange>
   /* Pause event generation, events will be aggregated meanwhile.  */
   pause();
   /* Resume event generation.  */
   resume();
   /* Stop watches the files */
   dispose();
}

pause(): Pause will stop the watcher from firing events, and instead aggregate the events.

Pause is meant to be used as semaphore such that multiple pause calls can be nested.

resume(): Resumes the events if the semaphore reaches 0 count.

Example usage:

Simple case:

const watcher = new Watcher(uri);
watcher.onChanged(changes => {});
watcher.dispose();

Pausing case:

function saveFile() {
this.watcher.pause()
await fs.save();
this.watcher.resume();
}

Nested pausing case:

function saveMultiFiles() {
this.watcher.pause()
saveFile();
saveFile();
saveFile();
this.watcher.resume();
}

The WatcherImpl:

The implementation of a watcher controls calls to the FileSystemWatcherServer.

class WatcherImpl implements Watcher, Disposable() {

protected readonly id;
@inject(FileSystemWatcherServer) protected readonly server;

constructor(uri: string, options? WatchOptions) {
  this.id = this.server.watchFileChanges(uri,options);
}

dispose() {
  this.server.unwatchFileChanges(this.id);
}
}

The FileSystemWatcherServer:

This is to be refactored such that there is only 1 FileSystemWatcherServer.

And that the interface returns a watcher id to be used by a Watcher interface implementation.

So while the interfaces stays the same:

    watchFileChanges(uri: string, options?: WatchOptions): Promise<number>;
    unwatchFileChanges(watcher: number): Promise<void>;

The implementation would change such that events are sent by watcher so one way would be like:


protected fireDidFilesChanged(watcherId): void {
  if (this.watchers[id].pauseCount === 0) {
   this.watchers[id].onChanged(event);
 } else
 {
  this.watchers[id].events.push(...this.changes);
  }
}

protected pause(watcherId) {
  this.watchers[id].pauseCount++
}

protected resume(watcherId) {
  this.watchers[id].pauseCount--
  if (this.watchers[id].pauseCount === 0 && this.watchers[id].events.length > 0) {
   this.fireDidFileChanges(watcherId);
}   
}

Refactoring considerations

At the moment assumptions are made in the code that the FileSystemWatcher watches the workspace, these need to be refactored such that the WorkspaceService exposes a watcher and that clients register their callbacks on that.

Conclusion

I think with a design like this we will be able to have 1 nsfw process, manage large event generations while doing a file system operation efficiently and provide a clear API.

Comments are welcome.

akosyakov commented 6 years ago

Is there guarantee that calling save will cause fs events immediately? I don't think so. The actual case is that events come later after save already was completed and they won't be buffered between pause and resume.

marcdumais-work commented 6 years ago

@akosyakov Do you mean that even when node seems to be done with fs.writeFile() it's possible that the physical write has not occurred yet (e.g. because of filesystem buffering)?

akosyakov commented 6 years ago

Node is single threaded and there is a single task queue processing requests from different clients as well as notifications from nfsw watching process. So processing of nfsw notifications will be delayed by processing other requests.

You can try to make a change and then look at ws frames when setContent was finished in filesystem socket and when onDidChangeContent arrived for the same file in fs-watcher. For me results are following:

13:36:08.020 -> 13:36:08.025 (setContent)
13:36:08.135 (onDidChangeContent)

The first event arrives 100ms after save was finished.

Probably there could be other sources of delays depending on the implementation of nfsw and fs module. I will need to read up on it.

akosyakov commented 6 years ago

btw there are so much spam in fs-watcher socket because of .git/index.lock

hexa00 commented 6 years ago

Node is single threaded and there is a single task queue processing requests from different clients as well as notifications from nfsw watching process. So processing of nfsw notifications will be delayed by processing other requests.

In this case it's a bit different nice nsfw runs in an external process, thus there are 2 event loops.

Here's how I see it:

BackendEvent Loop

save()
- await pause()
  - sends pause to nsfw via websocket
- await is done
- await node.save() ... some events...
- await is done
- await resume()
  - sends resome to nsfw via websocket
- resume is done()

Nsfw Event Loop

pause executed
events are generated from node.save() in backend event loop
events are processed Assuming events are emitted sync by inotify all events should have been emitted and are inside the event loop at this point.
resume event is received
resume is processed.

We do incur a penalty since we need to do await pause() and await resume() however. But events should be aggregated properly.

We would need to dig a bit deeper and test to be sure however.

I think the rest of the design stands on it's own and thus by implementing it it should be easy to test the pause/resume feature. Or keep a deboucer... or maybe use both.

eclipse-theia / theia

Optimize file event handling #1269

Current design

Implementations details:

Number Of Processes:

Aggregation of events:

Watched files notifications:

Ignore rules:

Problems with this approach:

1221 This creates multiples nsfw processes

1188 Showed that the timeout sometimes doesn't aggregate the events

942 As discussed watched files notifications should not be shared per

New design proposal:

The Watcher interface:

Example usage:

Simple case:

Pausing case:

Nested pausing case:

The WatcherImpl:

The FileSystemWatcherServer:

Refactoring considerations

Conclusion