Return value of wtr_watcher_close in watcher-c

AlliBalliBaba commented 3 days ago

When I'm running the following code with watcher-c;

void * watcher = wtr_watcher_open(path, handle_event, (void *)data);
bool success = wtr_watcher_close(watcher);

success always is false when closing the watcher, even if the watcher started successfully. From the the testfile I was under the impression that it should return true if stopping is successful.

Is thig a bug or is the behaviour expected?

e-dant commented 3 days ago

   74 bool wtr_watcher_close(void* watcher)
   75 {
   76   if (! watcher) return false;
   77   if (! ((wtr::watcher::watch*)watcher)->close()) return false;
   78   delete (wtr::watcher::watch*)watcher;
   79   return true;
   80 }

Either the watcher is null or the underlying implementation couldn't close the OS's resources, which is problematic.

Do we have a minimally reproducible example?

e-dant commented 3 days ago

I think https://github.com/e-dant/watcher/commit/343e6b5975036d43a3cadc2eaf813584c2b58f3c might fix this, but I'm not sure yet.

There might be some double-close logic (for any reason at all -- implicit destructors if the api is used with "smart pointers" and the user also calls close(), unintentional API use, anything at all -- which that pending check was meant to guard against), but there might have also been a case where the semaphore skips right through the pending state into the released state, which that commit should also address.

LMK if this fixes the issue for you.

AlliBalliBaba commented 2 days ago

This didn't really seem to fix the issue. I'll try to create a minimal reproducer when I have time. It's not that critical for us since we usually stop the whole process after stopping the watcher.

e-dant commented 1 day ago

I think this is worth investigation.

It's a problem if we're failing to close resources.

I think a minimal reproducer would be enlightening -- Particularly, I'm curious if there's a bad interaction in the Go<->cgo<->C language chain, or if this is a library bug.

AlliBalliBaba commented 11 hours ago

I created a minimal reproducer here: https://github.com/AlliBalliBaba/watcher-debug

With docker you can just clone the repo and do something like:

docker build -t debug-watcher .
docker run debug-watcher

The code that opens and closes the watcher is in watcher-test.c

e-dant commented 10 hours ago

I created a minimal reproducer here: https://github.com/AlliBalliBaba/watcher-debug

The watcher here is being closed before it has a chance to finish opening all the resources it needs.

The thread running the watcher doesn't usually start in this example until control has reached the close() call.

We could probably handle this better, but we're not leaking anything here.

e-dant commented 9 hours ago

With some extra logging, we can trace what's happening:


close/1/1727620570393 closing watcher
close/1/1727620570393 releasing semaphore
close/1/1727620570393 waiting for watcher
open/1/1727620570393 watching /go/src/app
open/1/1727620570393 e/self/live@/go/src/app
event:
  path name: e/self/live@/go/src/app
  path type: 4
open/1/1727620570393 done watching /go/src/app
open/1/1727620570393 e/self/die@/go/src/app
event:
  path name: e/self/die@/go/src/app
  path type: 4
open/1/1727620570393 watcher closed, ok? no
failure

Format is routine/nth call to it/time of message (etc)

Note that the event handler here receives a "special" path and path type:


event:
  path name: e/self/live@/go/src/app
  path type: 4

The e/ prefix here means "error". (s/ would mean "success".)

This communication mechanism is detailed here in the The Library/readme

The watcher type is special. Events with this type will include messages from the watcher. You may recieve error messages or important status updates, such as when it first becomes alive and when it dies. The last event will always be a destroy event from the watcher.

e-dant commented 9 hours ago

However, I'm not sure this fully addresses the issue.

While, in this example, the issue could be solved by waiting for that self/live message (with the special "watcher" path type), I believe you were also seeing errors on close in a more long-running context.

I think we should explore that more.

Thoughts?

AlliBalliBaba commented 7 hours ago

Ah yeah that makes sense. I was seeing the error sometimes in tests, probably because the tests would finish before the watcher could start. I tested it again and with a bigger grace period before shutdown and it works properly 👍.

I guess I'm fine with the current behavior then. It could maybe be documented somewhere that wtr_watcher_close will return false (but still stop the watcher) if the watcher hasn't sent the e/self/live@ event yet

e-dant / watcher

Return value of wtr_watcher_close in watcher-c #52