Charcoal-SE / SmokeDetector

Headless chatbot that detects spam and posts links to it in chatrooms for quick deletion.
https://metasmoke.erwaysoftware.com
Apache License 2.0
474 stars 182 forks source link

Cache caught posts locally during MS downtime and send them to MS after it's alive again #2461

Open iBug opened 6 years ago

iBug commented 6 years ago

MS is frequently going out of service recently. It'd be a nice feature if Smokey can cache posts locally when MS is down and registers the cached stuff on MS, so we don't lose records.

Potential difficulty: Deletion watcher. Posts may have been deleted before or after MS reanimates and this needs separate handling (caching the deletion time or send it as normal).

stale[bot] commented 4 years ago

This issue has been closed because it has had no recent activity. If this is still important, please add another comment and find someone with write permissions to reopen the issue. Thank you for your contributions.

teward commented 3 years ago

On the SD side, if we have a status indication test for if MS is up or down, we can simply implement a Queue type object (extended to what we need) that'd keep a FIFO queue for items to process for MS. We still need to find a solution that'll write it to disk in a timely manner and flush out all the objects to nil if MS is back up.

The queue could even be a Queue of NamedTuples with named keyword arguments - timestamp and request. Still needs to be written to disk in a way that can be restored somehow.

teward commented 3 years ago

Basic prototype for the queue itself on SD could be something like this:

# Superseded in later comment, please refer to that for code.

MSRequestsQueue is a Queue object, while MSRequest is a namedtuple and defaults to 'now' in epoch time for the timestamp part of the named tuple.

If MS is up and there's items in the queue we can do this to empty the queue:


for item in MSRequestsQueue.queue:
    # do something with `item` object, which has two named keywords in it:
    # `.timestamp` for the Epoch, and `.request` for whatever the request is.
    ...

We just need to implement two other bits here:

(1) actually caching the requests and storing them in a retrieveable way on disk if we're offline for some reason (2) implementing this with a Metasmoke status check.

I haven't written any code into SD for this just prototyping ideas here.

teward commented 3 years ago

I've redone my prototype to use the SimpleQueue, which removes a few extra tasks - we don't need a task tracking. But I also altered the MSRequest object so that we can instantiate the request NamedTuple with just the object, and default to the now() timestamp without specifying the keyword arg.

The queue remains a FIFO queue, though, so first item in is first item out when processed / retrieved.

import queue
import collections
import datetime

# Instantiate MSRequestQueue as a SimpleQueue
MSRequestQueue = queue.SimpleQueue()

# Instantiate MSRequest NamedTuple object type, for keyworded arguments. No type checking though...
# Objects later can just be instantiated with MSRequest(REQUEST_OBJECT) and request gets populated with the object.  timestamp will autofill.
MSRequest = collections.namedtuple("MSRequest", field_names=['request', 'timestamp'], defaults=[None, int(datetime.datetime.now().timestamp())]

# Examples
MSRequestsQueue.put(MSRequest("Object"))
MSRequestsQueue.put(MSRequest("Object 2"))
MSRequestsQueue.put(MSRequest("Object 3"))
MSRequestsQueue.put(MSRequest("Object 4"))

# Loop over items in the queue, and output them so we can see them.
while not MSRequestsQueue.empty():
    item = MSRequestsQueue.get()
    print(item)

This produces output like this, but with different timestamps - these requests were all added in testing within a single second:

MSRequest(request='Object', timestamp=1631897567)
MSRequest(request='Object2', timestamp=1631897567)
MSRequest(request='Object3', timestamp=1631897567)
MSRequest(request='Object4', timestamp=1631897567)

This is the representation of the named tuple object. item.request will get the request for the specific item once it's been gotten and popped out of the queue, item.timestamp will get the epoch. Note that once you do a MSRequestsQueue.get() that item that's been retrieved is gone from the Queue - there is no way to put it back in at the same position, you have to put it in at the end of the queue if you want it 'requeued'.

Replace print(item) with whatever you want to do with the request object.

makyen commented 2 years ago

During the long MS downtime around 2021-11-25, I implemented a queue to store data which was intended to be sent to MS, but wasn't actually sent, either due to MS being down or the request failing, in this commit. At this point, there's nothing, yet, which tries to clear the queue and resend the queued requests, nor anything on MS which handles receiving the time offsets (which should be changed to using the epoch, rather than the current time.time() value).

When the rest of this is built out, I did save the data from the portion of the MS downtime which was after the above commit was added.