bytadaniel / clickcache

Clickhouse data collector for delayed batch insert
MIT License
6 stars 1 forks source link

How to manually load all unresolved chunks? #3

Open timbowhite opened 1 month ago

timbowhite commented 1 month ago

Thanks for this module.

If the process ends before all chunks are resolved, data will remain unwritten to clickhouse.

How can I explicitly check if there are unresolved chunks and trigger resolver.onResolved to load all unresolved chunks?

bytadaniel commented 3 weeks ago

@timbowhite Hi! Thanks for reporting the issue. I’m ready to address it as soon as possible. Let’s go over the details.

If you're using process memory to store cached data, it's not possible to save the data upon receiving a termination event, even if I manage to pass control outside the chunk processing loop.

However, when saving to disk, there's already a mechanism in place to handle data saving in case of unexpected process termination via sigint and sigterm signals.

I’ve noticed that this isn't always sufficient, as not all process termination scenarios are handled by these events. Just to clarify, if the OS decides to terminate the process for any reason, data won’t be saved in such cases either (e.g., sigkill).

Additionally, I observed that during cache execution, data that hasn't yet found a place in chunks could be lost in certain cases.

What kind of control handoff interface would you prefer for handling exit scenarios?

In earlier versions, I left this responsibility to the module user, but in the latest major version, I decided to handle data flushing to disk myself, since signal handling requires specific considerations to ensure data is reliably saved. For example, the save operation must be synchronous.

Let me know if you'd like any further adjustments!

timbowhite commented 3 weeks ago

I just need a method or way to manually trigger the resolver.onResolved event.

And perhaps a method to check if there are unresolved chunks in the cache.

It's needed since the resolver is interval based, so the scenario exists where there may be unresolved chunks in the cache, but resolver.onResolved never gets called, because the process ends before chunkLifeMs is reached and/or chunkSize is never exceeded.

It's not really a process termination or interrupt event, more like a "the script is done event", so let's make sure all remaining cached rows get inserted into Clickhouse.

bytadaniel commented 3 weeks ago

This scenario exists only because there is no common error handler. https://github.com/bytadaniel/clickcache/blob/master/src/chunk-resolver.ts#L277 Usually, the process saves all runtime cache on the disk, but now only for sigint and sigterm events and not for unhandled error and exit events. That's why your cache disappears That is a bug. I'working now to fix it. Also I have to add some interfaces to return cache state and resolve everything from outside

bytadaniel commented 3 weeks ago

If you try to handle the terminate process to save cache data in Clickhouse, it will not work, because you will get synchronous context and your request promise will never reach the host. That's why I want to manage it myself.