Closed evgenydmitriev closed 3 years ago
In GitLab by @durm on Sep 13, 2018, 13:31
moved from IncaSec/nterminal/cdc/cdc-grabber#77
In GitLab by @durm on Sep 13, 2018, 13:33
changed the description
In GitLab by @durm on Sep 13, 2018, 13:49
@evgenydmitriev just as an experiment in case of bounty. I created project on fl.ru platform. Got 2 strange non-relevant responses from guys with php, mysql as main skills. (whyyy?))) and 2 guys provided me with time, price and code snippets.
In GitLab by @durm on Sep 20, 2018, 18:34
changed title from {-telegram-} to {+Develop telegramstream source for Spring Cloud Dataflow streams - 500$+}
In GitLab by @durm on Sep 20, 2018, 18:34
changed the description
In GitLab by @durm on Sep 20, 2018, 18:37
changed the description
In GitLab by @durm on Sep 20, 2018, 18:37
changed the description
In GitLab by @durm on Sep 20, 2018, 18:43
changed the description
In GitLab by @durm on Sep 20, 2018, 20:03
changed title from Develop {-telegram-}stream source for Spring Cloud Dataflow streams - 500$ to Develop {+Telegram +}stream source for Spring Cloud Dataflow streams - 500$
In GitLab by @durm on Sep 21, 2018, 15:33
@kemi here is clickable schema for telegram API objects. Please, help me formulate the expected schema of output data for telegram source.
just like in the case of Twitter. describe what you need in form of a parameter mapping. {sender, text, timestamp, etc}
you can put in a comment or update my issue directly.
In GitLab by @durm on Sep 21, 2018, 15:38
changed the description
In GitLab by @durm on Sep 26, 2018, 12:39
changed the description
In GitLab by @durm on Sep 26, 2018, 13:17
changed the description
In GitLab by @durm on Sep 26, 2018, 13:19
changed the description
In GitLab by @myrmecophagous on Sep 27, 2018, 19:43
@durm here is the schema for Telegram:
{
"type": "object",
"properties": {
"id": {
"type": "string",
"description": "Message ID"
},
"date": {
"type": "string",
"description": "Message timestamp, ISO date in UTC"
},
"source": {
"type": "string",
"description": "Predefined string, the same for all messages",
"default": "Telegram"
},
"category": {
"type": "string",
"description": "Channel title; we should be able to pass it as parameter from configs"
},
"channel_id": {
"type": "string",
"description": "Channel id"
},
"author": {
"type": "string",
"description": "Message sender id"
},
"reciever": {
"type": "string",
"description": "Message reciever id"
},
"content": {
"type": "string",
"description": "Message text"
},
"related_documents": {
"type": "array",
"description": "messageMediaDocument",
"items": {
"type": "object",
"properties": {
"content": {
"type": "string",
"description": "Document text"
},
"date": {
"type": "string",
"description": "Document _creation_ date, if available; ISO date in UTC"
},
"size": {
"type": "integer",
"description": "File size in bytes"
},
"file_name": {
"type": "string"
}
}
}
},
"media": {
"type": "array",
"items": {
"type": "object",
"properties": {
"content": {
"type": "string",
"description": "messageMediaPhoto binary content or messageMediaVideo.thumb"
},
"description": {
"type": "string",
"description": "Media caption"
},
"date": {
"type": "string",
"description": "Media _creation_ date, if available; ISO date in UTC"
},
"type": {
"type": "string",
"description": "Media type: [image|video]"
},
"size": {
"type": "integer",
"description": "File size in bytes"
}
}
}
}
},
"required": [
"id",
"date",
"source",
"category",
"author",
"language",
"content"
]
}
I'm not sure if reciever
is a relevant piece of information, if we're gonna listen to groups.
As for media and document date
, it should be the date when the media was created / updated, not the timestamp when the it was posted to the chat, since we already have this information; something similar to timestamps we extract from PDF meta.
In GitLab by @durm on Sep 28, 2018, 11:01
changed the description
In GitLab by @aturok on Oct 12, 2018, 19:22
As discussed with @durm , I'd like to claim this task.
In regards to Telegram integration there are certain concerns:
I am currently investigating what can be done to implement the requested scenario properly - will keep you posted.
In the meantime, I've got a couple questions in regards to the desired outcoming message structure:
source
field - should it be hardcoded or we want a configuration paramter for it?receiver
field will have to be empty in most cases and doesn't seem relevant in the telegram context (especially if we speak of channels which are one-direction means of communication)language
parameter - do you expect it to be configured on per-channel basis or to be deduced from the message content? If we speak of the second option, do we want to inject the language-detection mechanisms in the tg source app or maybe it's more reasonable to craft a separate app for this purpose?In GitLab by @durm on Oct 12, 2018, 20:05
@myrmecophagous please, assist
In GitLab by @myrmecophagous on Oct 12, 2018, 20:17
@aturok
source
field could perfectly be hardcoded.receiver
is okay, let's keep it in the schema though.In GitLab by @durm on Oct 12, 2018, 22:37
@myrmecophagous there are two possible ways:
for me first one looks good.
In GitLab by @durm on Oct 12, 2018, 23:42
@myrmecophagous otherwise, for source logic language
is not required at all. So, it shouldn't be provided as input param.
Why do we need to return message with language that we just provide as input argument? let's provide it to component which will consume this data directly.
In GitLab by @myrmecophagous on Oct 13, 2018, 01:37
@durm In this case, instead of the language code, we'd need the channel id (along with its name) in the output.
In GitLab by @durm on Oct 13, 2018, 08:57
assigned to @aturok
In GitLab by @myrmecophagous on Oct 16, 2018, 11:00
Schema updated.
In GitLab by @aturok on Oct 25, 2018, 01:35
@durm @myrmecophagous please find a brief status report below.
I was able to make TDLib - the official client-library for Telegram API work locally. It allows to receive messages (unlike Telegram Bot API) and should work for our task. Three problems with it though:
In GitLab by @anshlykov on Oct 25, 2018, 06:49
@aturok Actually, I do not fully understand why TDLib instead of Bot API? Isn't it too excessive and aren't we making it too complicated?
@evgenydmitriev If we still choose TDLib, then I see no insoluble problems in paragraphs 1 and 2.
In GitLab by @aturok on Oct 25, 2018, 13:58
@evgenydmitriev not sure about Google Voice, but I believe they should work - will check with the number that you have provided.
@anshlykov you're right it gets too complicated, but the issue with Bot API is that bots can be subscribed to telegram channels only by channel administrators, meaning you would have to ask the owner of every channel that you're interested in to add your bot to the channel. And if they refuse, you can't do anything. Should this be acceptable for our solution, I would definitely go with Bot API - it's way-way simpler.
In GitLab by @aturok on Nov 6, 2018, 12:02
Current experiments are in this repo: https://github.com/aturok/tgsourcecheck/commits/tgsource (took TDLib repo for a start). Next steps: spearate from TDLib source, wrap into SCDF harness.
This week will also share a repo with the IRC source app (in the appropriate issue)
In GitLab by @evgenydmitriev on Apr 29, 2019, 22:46
closed
In GitLab by @penpyt on Jul 19, 2018, 15:25
Develop a
telegramstream
source component, which can be easily integrated into Spring Cloud Dataflow streams.Toolset: JAVA, Spring, Spring-dataflow, docker
Component should
following commands should be provided:
Definition of done