Together-Java / TJ-Bot

TJ-Bot is a Discord Bot used on the Together Java server. It is maintained by the community, anyone can contribute.
https://togetherjava.org
GNU General Public License v3.0
101 stars 86 forks source link

Collecting analytics for #active_questions #660

Open surajkumar opened 1 year ago

surajkumar commented 1 year ago

Recently, a question was asked by a member of the server, along the lines of "How long does it take for a question to get answered?". This information is unknown.

We could gather analytics based on the data in #active_questions.

Data such as:

.. and any more that you can think of ..

Then we can then calculate averages based on this information to answer questions like "on average, how long does it take for a question to be answered".

The information could be displayed in a specific analytics channel and for the purpose of marketing, a website (or as a dynamic image to throw onto advert boards, should you wish).

To get the initial dataset, a quick scrub should take no time at all. Going forward, it might require a lot more effort to keep the data up-to-date after the initial scrub.

Opinions, please.

Tais993 commented 1 year ago

Something to take a look at, prometheus Is this possible to have this ran in the bot itself, maybe export to a database? Gotta figure this out.

Alternatively we might need install scripts that instill Prometheus, but i hope that's not required.

Or docker, or we disable stats if there's no active Prometheus instance.

Or a complete different approach, handling stats ourselves, so we wouldn't use a library, but would create our own "mini-tool". This has to be thought through

Nxllpointer commented 1 year ago

I really like the idea if this feature!

surajkumar commented 1 year ago

Something to take a look at, prometheus Is this possible to have this ran in the bot itself, maybe export to a database? Gotta figure this out.

Alternatively we might need install scripts that instill Prometheus, but i hope that's not required.

I was planning having the bot do all the work from the initial scrub which will save everything to a database. The scrub processing being, using JDA to get all the message history from the #active_questions channel and ripping out everything we need.

In JDA this would be something like:

channel.getThreadChannels().forEach(thread -> { 
  thread.getIdLong()
  thread.getName() 
  thread.getTimeCreated
  messages = thread.getHistory().getRetrievedHistory()
}
database.save(...)

Then moving forward, whenever a new thread is created (or /ask is invoked), add a new entry the database. When /help-thread-close is invoked, update a status column.

We can calcuate what we want to see from there.

Tais993 commented 1 year ago

Well yes, and Prometheus would be the database in my case. If you'd install Grafana you'd have amazing graphical diagrams and whatever else you can think of related to data.

(Prometheus doesn't guess you want to know how many threads are open etc. You still have to tell Prometheus, and Prometheus would then show you how many threads get opened over time) and more

surajkumar commented 1 year ago

Thought about this some more. I'm not sure there is any point in having a database. We could either scrub the #active_questions channel periodically (e.g. once a day) and still have result we want or manually invoke it with a command. This would make the solution self sufficent and portable.

surajkumar commented 1 year ago

I don't think it's possible to get archived thread channels using JDA. Tried literally everything. The only alternative would be to start collecting data moving forward but we won't have any information over the previous years and it could take ages for future data to be of any use.

Nxllpointer commented 1 year ago

I mean a month worth of info also says a lot

surajkumar commented 1 year ago

I mean a month worth of info also says a lot

Complexity still rises as we would have to hook onto all the relevant slash commands (e.g. /ask and /help-thread-close), monitor on onMessageReceived events, introduce a new table (and setup if a DB if one isn't present) and anything else for the data collection part. Alongside the existing task of figuring out how we are gonna display the information and do the calculations we need.

Lots of testing will then need to be done so that the changes to existing functionality do not break.

Unless somebody is volunteering to do the work, I would suggest closing this with the reason of it being too big of a job.

Nxllpointer commented 1 year ago

Hooking the slash commands should not be an issue since the BotCore calls the events somewhere. I would not close this yet.

surajkumar commented 1 year ago

Feel free to look into data collection.

Zabuzard commented 1 year ago

I would suggest closing this with the reason of it being too big of a job

then just reduce the scope. you guys are planning this task to be way too complex. step back and do something simple.

we already have a database collection meant for stats, the help-threads table. just add a few columns to it and start collecting. first, figure out which metrics are actually interesting and helpful to know. then figure out how to collect them and then plan accordingly.

for example, lets say we want to know how many channels are closed with a RED-activity indicator, then we simply add can add two columns to the database table:

and based on that we can already retrieve the metric. for starters, we could simply have a slash command /help-metrics which just outputs a simple list:

dont overcomplicate. prometheus, grafana and all that shit is really cool. but way too complex for the first steps. first figure out what metrics are needed and then build a small PoC as explained. we can always improve afterwards and make graphs out of it.

Tais993 commented 1 year ago

I don't think it's possible to get archived thread channels using JDA. Tried literally everything. The only alternative would be to start collecting data moving forward but we won't have any information over the previous years and it could take ages for future data to be of any use.

Go to the javadoc and look up "retrievearchive"

You can retrieve archived channels

github-actions[bot] commented 1 year ago

This issue is stale because it has been open 30 days with no activity. Remove stale label, comment or add the valid label or this will be closed in 5 days.

derrykid commented 1 year ago

I agree with @Zabuzard

just reduce the scope. you guys are planning this task to be way too complex. step back and do something simple.

I think the following columns might already be available in the database:

These 3 columns can help us to create a simple model and answer the hypothesis: what type of question has the quickest response averagely?

We can model a simple text processing model, like text extraction, i.e. "count" the occurrence of the vocabulary in the thread title.

You can expect something like: "how to start learning java", "how to add new item to Arraylist?" get the lowest response time.

Based on this simple model, if the question is very common, we can reply with something like:

The question regarding arraylist will usually be answered in 5 mins

Some unseen question or difficult one, like the model has never seen, we can give the response like:

This type of question is a sparse case, it might take up more than 30 mins to receive an response. Please wait patiently

ankitsmt211 commented 11 months ago

I'd like to work on this one if we can reduce the scope, How about just "Tickets open" and "Tickets close" for a start, prolly more stats on what category each belongs to. Im sure we can extend it to include more stats later.

Zabuzard commented 11 months ago

Sure, go ahead :)

Bryce72 commented 3 months ago

I would like to work on this. Has there been any updates such as : "Tickets open" and "Tickets close"?

ankitsmt211 commented 3 months ago

@Bryce72 pr #990 should set up the base with meta data, I'll try to get it merged today.