UkraineNow-Intel / autoSA-backend

Django backend for autoSA
0 stars 1 forks source link

Implement a class / method to retrieve posts from Telegram #29

Closed j-bennet closed 2 years ago

j-bennet commented 2 years ago

Need to be able to:

Sample code using sync telethon API:

import datetime as dt
from backports import zoneinfo
from telethon.sync import TelegramClient
from telethon import functions, types

TZ_UTC = zoneinfo.ZoneInfo("UTC")

api_id = 123
api_hash = "blah"

def main():
    # channel = await client.get_input_entity("t.me/ukrainearmyforce")
    # print(channel.stringify())
    result = None
    with TelegramClient("uanow", api_id, api_hash) as client:
        result = client(
            functions.messages.SearchRequest(
                peer="t.me/ukrainearmyforce",
                q="Харків",
                filter=types.InputMessagesFilterEmpty(),
                min_date=dt.datetime(2022, 4, 11, 0, 0, 0, tzinfo=TZ_UTC),
                max_date=dt.datetime(2022, 4, 12, 0, 0, 0, tzinfo=TZ_UTC),
                offset_id=0,
                add_offset=0,
                limit=100,
                max_id=0,
                min_id=0,
                hash=0,
                from_id=None,
            )
        )
        print(result.stringify())

See API docs:

https://tl.telethon.dev/methods/messages/search.html

j-bennet commented 2 years ago

cc @robintibor @opowell28

j-bennet commented 2 years ago

The function should return list of dicts.

For what fields we need, see the Source model, each dict should contain the same fields as Source:

https://github.com/UkraineNow-Intel/autoSA-backend/blob/master/api/models.py

Some of the fields are allowed to be blank (telegram messages probably don't have headline, but articles scraped from websites will).

Make sure you pull master and rebase on master regularly.

robintibor commented 2 years ago

So I have made some small commits, see https://github.com/UkraineNow-Intel/autoSA-backend/blob/449cc366cc42980a88f6f81ac732c3538d628b8e/infotools/telegram/telegram_search.py and https://github.com/UkraineNow-Intel/autoSA-backend/blob/449cc366cc42980a88f6f81ac732c3538d628b8e/api/models.py#L10-L24

Hope this is going in right direction?

Some questions:

1) Do we want to have this just as a single function or a class, also don't know if client creation inside function causes any substantial overhead that could be avoided or not

2) Did I do this correctly adding https://github.com/UkraineNow-Intel/autoSA-backend/blob/449cc366cc42980a88f6f81ac732c3538d628b8e/api/models.py#L24 and using it here: https://github.com/UkraineNow-Intel/autoSA-backend/blob/449cc366cc42980a88f6f81ac732c3538d628b8e/infotools/telegram/telegram_search.py#L49-L57

3) Did we want any tests here? And if so, I assume they should not call real API but mock something?

4) I have added field pinned, but maybe this it not intended? Or is it? https://github.com/UkraineNow-Intel/autoSA-backend/blob/449cc366cc42980a88f6f81ac732c3538d628b8e/infotools/telegram/telegram_search.py#L54

5) Not clear to me why the telegram_search.py is in its own folder, seems unnecessary? Should we move it up to infotools?

6) Should the source field be populated? e.g., with sender sender_id post_author or something?

7) Should text language be extracted by any tool?

8) I don't really understand which chats can be "seen" by Telegram API, like before I had to call client.get_dialogs() when I created client with new session_id to search messages in a specific chat/group name.

robintibor commented 2 years ago

Let's continue discussion in PR https://github.com/UkraineNow-Intel/autoSA-backend/pull/30