Rongronggg9 / RSS-to-Telegram-Bot

A Telegram RSS bot that cares about your reading experience
https://t.me/RSStT_Bot
GNU Affero General Public License v3.0
1.54k stars 277 forks source link

[Feature Request] Autoreply to previous messages with filters #381

Open andrigamerita opened 11 months ago

andrigamerita commented 11 months ago

Hello, I've found your bot ideal for many uses. My latest one is keeping a microblog on my own website, and automatically copying all content I post into my Telegram channel, which looks seamless to my subscribers. It all works great while playing within WordPress intended methods of use, the bot interprets the RSS feed correctly and sends messages as expected.

However, once I try to step beyond this kind of standard use case, inventing my own non-standard solution to implement features that are simply not standardized, I find out that I need to invent my own hacks. My current practical issue is transposing my own "post quote" feature so that it works on Telegram. On my microblog, every time I want the current post to be considered as "in response to" or quoting an old one, I would place some HTML like the following at the start of my post (related note, I don't copypaste this every time, I use WP shortcodes, which are basically macros) ...

<p>[
    ⤴️ <a href="https://mysite/?&p=ID">https://mysite/?&p=ID</a>
]</p>
<iframe style="width: 100%;" src="https://mysite/?&p=ID"></iframe>

... where ID is the id of the previous post I'm mentioning. This HTML appears as-is in the RSS feed, and when interpreted by your bot it shows that link as-is, which is correct by the standards of an RSS reader, but not my ideal solution to the problem. (On a side note: is there some way to disable the replacement of iframe tags with links to the iframe's src, and just make it so the iframe is stripped entirely in the message the bot sends to Telegram, or the only way to do this right now is to edit the relevant code? Because, as things are currently, I end up with duplicate links in the message, which causes much confusion).

On real microblog platforms, including Telegram channels, we have something that doesn't exist on pure blogging platforms like WordPress, which is exactly a reply or quote feature. My idea is that the RSS bot should, with an optional setting, be able to check items in an RSS feed for substrings of an user-specified format. So, for example, the bot should be able to detect if a post it receives begins with the content [ ⤴️ {ANY_URL} ] (or same thing formatted in HTML instead of plaintext/Markdown, whatever option is preferable usually, if not both), and if so, check the Telegram messages previously sent by the bot for the substring {ANY_URL}. If, for example, a match is found at the end or the start of a previous message (where usually links are placed that reference the current post), then the bot understands that it should send the message for the new post as a reply to that old message. If multiple messages match the query, probably the bot should reply to the most recent one.

For example, let's say the bot has sent this message in my channel yesterday, after it received a post from my RSS feed:

<!-- RSS -->
...
<content:encoded><![CDATA[
<p>This test message is so boring!</p>
]]></content:encoded>
...
<!-- => Telegram -->
This test message is so boring!
[Backlink](https://mysite/?p=123)

Let's say that today the bot has received a new post (omitted the iframe from the HTML for brevity):

<!-- RSS -->
...
<content:encoded><![CDATA[
<p>[ ⤴️ <a href="https://mysite/?p=123">https://mysite/?p=123</a> ]</p>
<p>This message instead is fire!!! 🔥️</p>
]]></content:encoded>
...

What the bot would then send to Telegram, as it is now, would be the following:

<!-- => Telegram -->
[ ⤴️ <https://mysite/?p=123> ]
This message instead is fire!!! 🔥️
[Backlink](https://mysite/?p=456)

What I would like is that, after a proper configuration like I described above, the bot would instead send the message like this:

in reply to the message sent by the bot yesterday...

<!-- => Telegram -->
This message instead is fire!!! 🔥️
[Backlink](https://mysite/?p=456)

I hope that what I'm trying to achieve here is clear for you, and I would be grateful if this feature request interests you. If it's something that you would want in the bot but is low-priority for you, assuming I can find some time, I could try implementing this feature myself and sending a PR. :wave:

Rongronggg9 commented 11 months ago

The fact that an RSS entry is a reply to another, is so-called "metadata". But content, summary or description, whatever, is so-called "data". Extracting metadata from data is quite weird and is a "dirty hack". It is obvious that a piece of metadata should occupy a dedicated XML tag in RSS/Atom in order not to mess things up. A nice example is https://github.com/DIYgod/RSSHub/blob/d93c99ea393f0c36876b86d944536bc5dab5ed2b/lib/views/atom.art#L54.

In fact, there are already some proposed RSS namespaces to mark replies, e.g.: https://web.resource.org/rss/1.0/modules/threading/ https://web.resource.org/rss/1.0/modules/annotation/

I, personally, indeed would like to see such a feature get implemented. But there are some prerequisites that must be solved before that:

  1. Concerning database size and efficiency, RSStT does not store the message ID of sent RSS entries. A careful redesign of the database schema, which should be migratable from the current one, is needed.
  2. RSStT does not send RSS entries in order because all entries are sent asynchronously. See also #157. However, the feature needs the thread parent to be sent before its children.
  3. There is no widely used specification or de facto standard. To construct an RSS that is compatible with various RSS readers, it would be better to use the above namespaces and keep the reference in content, summary, or description. Then RSStT would need to trim the reference inside content, summary, or description, which is also a dirty hack I disapprove of. But of course, a website master can provide two versions of RSS, one with the above namespaces and without reference in content, summary, or description, while the other one is on the contrary.

Anyway, I lack spare time to implement this myself, so PR is welcomed. If you decide to send a PR, you can create a WIP draft so that I can see your progress and give suggestions.🫣