Otann / morse

📡 Clojure interface for Telegram Bot API
Eclipse Public License 1.0
257 stars 48 forks source link

`polling/start` cannot signal when the updates channel is closed #28

Closed jonathanj closed 6 years ago

jonathanj commented 6 years ago

I have a problem where my bot no longer receive messages over Telegram, until I restart my application. This usually happens over long-ish period (about 24 hours) and I don't have the best Internet connection.

Reading the code for morse.polling it appears that morse.polling/start has no way to signal that it is no longer polling, and so a user will never know to call start again. create-consumer returns a channel that will contain exactly one result when the loop exits (as a result of updates being closed.) If start returned the result of create-consumer instead of the running channel, that will never be closed, it would be possible for user code to restart in the event of the polling loop dying:

(go-loop [ch  nil
          res nil]
  (if res
    (recur ch (<! ch))
    (recur (polling/start token api) true)))

I'm happy to submit a PR if you don't have time or the inclination to do this yourself, @Otann.

chiliec commented 6 years ago

Are u sure to it resolve a problem the best way? I know clojure a little bit, but I also know that @Otann said you need to catches exceptions in the handler (#17)

My solution:

(defonce telegram-poller (atom nil))

(defn -main
  [& args]
  (defn start
    []
    (go
      (try
        (reset! telegram-poller (p/start token bot-api))
        (catch Exception e
          (println (.getMessage e))
          (start)))))
  (start)
  (Thread/sleep Long/MAX_VALUE))

It works well, but it's definitely not a best practice too :)

jonathanj commented 6 years ago

@Chiliec I haven't been able to track down what causes the polling to stop, looking through my logs I don't see any exceptions that might cause this (although I might have missed it.)

I think @Otann's advice in #17 is good but from reading the code it looks like it's possible for create-producer to close the updates channel (async/thread will return nil if its body throws an exception, for example) and thus stop polling, without informing the caller of start. Exceptions raised in an async/thread body are not catchable with try/catch outside of the thread body.

Otann commented 6 years ago

@jonathanj hello Jonathan! thank you for the request.

  1. would be especially interesting to see why polling has stopped. I think after #17 I've wrapped handling messages with a catch-all expression.

  2. I supposed go-loop does return you a channel that is closed after loop is finished, doesn't it?

Otann commented 6 years ago

Ah! I see your point now, are you asking for something like this?

(defn start
  "Starts long-polling process.
  Handler is supposed to process immediately, as it will
  be called in a blocking manner."
  ([token handler] (start token handler {}))
  ([token handler opts]
   (let [running (chan)
         updates (create-producer running token opts)
         consumer (create-consumer updates handler)]
     [running consumer])))

As this should not happen at all, I feel hesitant doing it. @Chiliec is right, restarting things when they break is not the best practice ;) I would very much like to fix the root cause and add another try-catch in the proper place.

Could you try the piece of code above and share your logs/ideas on what causes the stop?

jonathanj commented 6 years ago

What's wrong with restarting things when they break? If the Telegram transport fails (because I have no Internet for 12 hours, for example) either Morse needs to implement retrying (which is at the wrong level, in my opinion, because how and when to retry should be my decision) or my application needs to implement retrying, which it can't do because there is no way for it to know whether the Morse polling loop is alive or not.

It would be nice to know what the root cause is, definitely, unfortunately I haven't managed to figure that out yet. The issue doesn't happen that often for me, so the debugging iteration loop is quite long.

Otann commented 6 years ago

@jonathanj nothing is wrong with restarting, what is wrong is when things break. Things should not break ;)

I'm thinking on a proper solution, which will give you that. In the meantime, what do you thing about the code above?

dixel commented 6 years ago

Recently got a similar issue (stopped long polling) and saw this exception:

Exception in thread "async-thread-macro-52" java.lang.Error: java.net.ConnectException: Connection timed out (Connection ti
med out)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1155)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
Caused by: java.net.ConnectException: Connection timed out (Connection timed out)
...
        at clj_http.client$request_STAR_.invokeStatic(client.clj:1181)
        at clj_http.client$request_STAR_.invoke(client.clj:1174)
        at clj_http.client$get.invokeStatic(client.clj:1187)
        at clj_http.client$get.doInvoke(client.clj:1183)
        at clojure.lang.RestFn.invoke(RestFn.java:423)
        at morse.api$get_updates.invokeStatic(api.clj:18)
        at morse.api$get_updates.invoke(api.clj:11)
        at morse.polling$create_producer$fn__12285$state_machine__7754__auto____12296$fn__12302$inst_12204__12306.invoke(po
lling.clj:26)
        at clojure.core.async$thread_call$fn__7963.invoke(async.clj:442)
        at clojure.lang.AFn.run(AFn.java:22)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)

I found that the updates channel gets closed in case of the exceptions like this:

    (go-loop [offset 0]
      (let [resopnse (a/thread (api/get-updates token (merge opts {:offset offset})))
            [data _] (a/alts! [running resopnse])]
        (case data
          nil
          (close! updates) ; <--

To me it seems quite clean to wrap get-updates into try-catch since it's interacting with real world and shouldn't break the application logic in case of any exception.

jonathanj commented 6 years ago
Dec 21, 2017 5:28:41 PM org.apache.http.impl.client.DefaultHttpClient tryExecute
INFO: I/O exception (java.net.SocketException) caught when processing request to {s}->https://api.telegram.org:443: Operation timed out

This particular thing just happened to me now, I'm sure it's not the only thing causing this but in this case morse isn't restarting properly but I also can't do anything about it from an application perspective, like a bit later retry or use a fallback delivery mechanism or inform someone.

Otann commented 6 years ago

@jonathanj @dixel @Chiliec Hey guys, I think I found a way to notify you when polling failed. I agree that library should not handle reconnects because if there is a misconfiguration, these endless reconnects could piss off the telegram.

I added a small piece that closes running channel. It means that you can block on reading from it and then when it happens, reconnect.

Released this as 0.3.1 will this fix this problem for you?

jonathanj commented 6 years ago

@Otann This sounds ideal, I'll upgrade my dependencies. I'm going to close the ticket since the code looks like it solves the issue. Thank you!