RnDAO / tc-botComm

GNU General Public License v3.0
2 stars 4 forks source link

Refactor botComm (fix buggy code) #97

Closed cyri113 closed 1 year ago

cyri113 commented 1 year ago

Description

The bugs that are stopping us from moving the charts into production (mvp) come from incomplete extraction and processing of data from Discord. Overall, this repository doesn't follow best practices. The idea is to refactor this component, including tests and a set of requirements.

Tasks

Next

Documentation

amindadgar commented 1 year ago

Looking at the list of intents, I think these intents should be at least selected (https://discord.com/developers/docs/topics/gateway#list-of-intents).

GUILDS (1 << 0)
  - CHANNEL_CREATE
  - CHANNEL_UPDATE
  - CHANNEL_PINS_UPDATE
  - THREAD_CREATE
  - THREAD_UPDATE
  - CHANNEL_PINS_UPDATE
  - THREAD_MEMBER_UPDATE
  - THREAD_MEMBERS_UPDATE *
  - STAGE_INSTANCE_CREATE

GUILD_MEMBERS (1 << 1) **
  - GUILD_MEMBER_ADD
  - GUILD_MEMBER_UPDATE
  - THREAD_MEMBERS_UPDATE *

GUILD_MODERATION (1 << 2)
  - GUILD_AUDIT_LOG_ENTRY_CREATE
  - GUILD_BAN_ADD

GUILD_EMOJIS_AND_STICKERS (1 << 3)
  - GUILD_EMOJIS_UPDATE
  - GUILD_STICKERS_UPDATE

GUILD_INTEGRATIONS (1 << 4)
  - GUILD_INTEGRATIONS_UPDATE
  - INTEGRATION_CREATE
  - INTEGRATION_UPDATE

GUILD_INVITES (1 << 6)
  - INVITE_CREATE

GUILD_MESSAGES (1 << 9)
  - MESSAGE_CREATE
  - MESSAGE_UPDATE

GUILD_MESSAGE_REACTIONS (1 << 10)
  - MESSAGE_REACTION_ADD

DIRECT_MESSAGES (1 << 12) **Not sure of this**
  - MESSAGE_CREATE
  - MESSAGE_UPDATE
  - CHANNEL_PINS_UPDATE

DIRECT_MESSAGE_REACTIONS (1 << 13) **Not sure of this**
  - MESSAGE_REACTION_ADD

GUILD_SCHEDULED_EVENTS (1 << 16)
  - GUILD_SCHEDULED_EVENT_CREATE
  - GUILD_SCHEDULED_EVENT_UPDATE
  - GUILD_SCHEDULED_EVENT_USER_ADD

AUTO_MODERATION_CONFIGURATION (1 << 20)
  - AUTO_MODERATION_RULE_CREATE
  - AUTO_MODERATION_RULE_UPDATE

AUTO_MODERATION_EXECUTION (1 << 21) **Not sure of this**
  - AUTO_MODERATION_ACTION_EXECUTION

I didn't add the delete part of each intent because I wansn't sure that we would need them. Ene could help us more for the intents @TjitsevdM .

TjitsevdM commented 1 year ago

Here are a few different comments:

Additional tasks: Should extract joined date per account Should extract profile picture per account (if we want to show this in the member tables)

Should we specify that the bot should get all messages from every thread in a channel?

Getting a list of all channels is to show in the dashboard which channels can be selected for analysis. Based on the channel permissions, channels where the bot doesn’t have access yet are marked so that the user can give the bot access there if they want these channels to be included in the analysis.

For the next features, extracting voice would be great if possible!

I'm not sure how the bot gets info about the different channels in a server. I don't know if we need to know when a channel gets removed or whether we can just read the channels that are present. Similarly for threads. That's the only thing I can think of for now for which we could need the delete part of the intents that are not listed by @amindadgar

amindadgar commented 1 year ago

I just looked more into the data that we saved in rawinfos table before and I can see we have these type of data

DEFAULT
REPLY
THREAD_CREATED
THREAD_STARTER_MESSAGE
RECIPIENT_ADD
CHANNEL_NAME_CHANGE
APPLICATION_COMMAND
CONTEXT_MENU_COMMAND 

Also thinking about the permissions something that crossed my mind was that, we cannot assume the activities of deletion of messages, channels, etc a kind of activity that we have in analytics, but for modifications of channels, messages, etc we can assume an activity (please others give feedback for this).

based on my opinion in the paragraph above, I'll update my previous message.

katerinabc commented 1 year ago

In Notion we have this document about API fields. Then there is this requirement document from autumn 2022

Other things to consider:

  1. Users can leave a server. Can we get data about that?
  2. I'm assuming we are getting channelids and are using this in the analysis script, but displaying channel names in the front end
  3. Channels can be deleted --> Delete the data created in that channel
amindadgar commented 1 year ago

Answering @katerinabc

  1. Yes we can, there's a permission for that to be added.
  2. Yes, the channelIds are being fetched (I suppose) but we also retrieve the channel names and store it in another db collection named channels.
  3. Yes, this is also possible but I'm not sure it was implemented before or not. for delete message, channel, ... operations a db delete operation is needed.

Anyway here's additional permissions that the bot should have (in addition to my message above)

GUILD_MEMBERS (1 << 1) **
  - GUILD_MEMBER_REMOVE

GUILDS (1 << 0)
  - CHANNEL_DELETE

I'm starting to think that we have to write down the permissions that aren't needed rather than writing down those ones that are needed (Since they're becoming huge).

Behzad-rabiei commented 1 year ago

each day the bot code runs and it does the following steps: 1- get a list of guilds based on the Guild collection in the database

2- for each guild it will check whether the bot has access to it or not. if not the isDisconnected will be set to true for that guild.

3- for each guild the isInProgress field will be set to true

4- for each guild, it gets a list of channels (not synced with selected channels) then it checks each channel and updates that channel if the channel exists in the database, if not it will create a document for that channel

5- for each guild, it gets a list of members then it checks each member and updates that member if the member exists in the database, if not it will create a document for that member

6- this is the extraction msgs step(the most important part): the code here is not clear at all but it supposed to extract messages from the selected channels and their threads within the selected period and then inserting them to the rawinfo collection. beside that it contains error too. (using function that does not exist - the error will catch but the console log inside the catch block is commented)

7- for each guild the isInProgress field will be set to false

note: there is a optional way that the bot will run for a single guild too.

daniel-ospina commented 1 year ago

An additional step we'll soon need to add in #5:

We haven't designed the opt-in system yet so for now is just about making sure the architecture enables us to easily add this

Behzad-rabiei commented 1 year ago

when creating or updating documents in the database, the bot is doing it one by one, which can be slow and inefficient. For example, when using the create method to create multiple documents, each document is created and saved to the database one at a time, resulting in a significant amount of overhead.

This issue occurs multiple times in the bot, as creating and updating rawInfos, accounts, channels, and guilds is done one by one. Refactoring these sections of the code to use insertMany can significantly improve the bot's performance.

Overall, by refactoring the code to use insertMany instead of create, the bot's performance can be improved, reducing the overhead of multiple database operations and potentially reducing execution time.

Behzad-rabiei commented 1 year ago

We've encountered an issue!

At present, when users connect to a server, they choose channels and a specific period in the settings.

Our objective is to retrieve channel messages following the selected channels and period.

However, the problem lies in the fact that the Discord API does not offer a method to obtain messages based on a date. Instead, it allows for fetching messages before, after, or around a particular message, using the message ID.

While it's possible to filter messages based on the period locally, it might not be efficient to loop through a large number of messages (e.g., 10,000). Currently, our bot filters each message to determine if it falls within the desired period.

One suggested workaround is to disregard the user-selected period, retrieve all channel messages, and store them in a database. The analyzer app can then perform its task based on the period.

It's worth noting that, initially, we need to obtain all messages. This will provide us with the ID of the most recent message, enabling us to fetch messages after that ID for subsequent extractions.


Nima solution :

- Retrieve the initial 100 messages.
- Inspect the last message in the 100-element array and compare it to the period.
- If the message falls within the period, proceed to fetch an additional 100 messages sent before the previous last message.
- If the message is not within the period, iterate through the array and filter out messages that don't belong to the specified period (the fetch messages process will stop).

This method requires looping over an array with only 100 elements at most, just once.


solutions:

A) extract all messages and filter them based on the period then saved them into DB(bot currently does this) looping over an large array is not efficient considering Node.js architecture

B)extract all messages and saved them into DB saving a large amount of data in the DB.

C) the approach which @nimatorabiv suggested :extract needed messages (in the worst case it will extract extra 100 messages which will filter with a single loop over 100 elements) and saved them into DB. less API call and more efficient

cyri113 commented 1 year ago

The specific requirements will be move to the https://github.com/RnDAO/tc-discordBot repository.