Bot-detector / Bot-Detector-Core-Files

The server and processing files for the Bot Detector Plugin
GNU General Public License v3.0
17 stars 15 forks source link

missing data #2

Closed extreme4all closed 3 years ago

extreme4all commented 3 years ago

hey,

i fail to understand the dataflow, i see you have many pickle files how are they generated?

Ferrariic commented 3 years ago

Hey there - sorry about the data flow mess. I'll work on cleaning that up so it's much more readable.

I will focus on making a much more readable format shortly - which should help answer some of your questions.

extreme4all commented 3 years ago

the data you have are only player names?

extreme4all commented 3 years ago

FYI we are doing something very similar to: https://www.youtube.com/watch?v=Dk4Yahv2lek&list=PLX9loFun2zNkqwEk3abeMzZnVlT0YPxkp but on a cleaner way.

Ferrariic commented 3 years ago

the data you have are only player names?

No, the data are the stats from the hiscores, located in "PIfile" :)

So if loaded in,

ykmfile = generated labels Pifile.reshape(-1,78) = hiscore data Pnames = names

So that:

ykm[4] is the label for player pname[4], with features PIfile[4]

Ferrariic commented 3 years ago

FYI we are doing something very similar to: https://www.youtube.com/watch?v=Dk4Yahv2lek&list=PLX9loFun2zNkqwEk3abeMzZnVlT0YPxkp but on a cleaner way.

Really cool project!

extreme4all commented 3 years ago

i don't know how effective the raw players stats are, for detecting bots. Some data engineering, we can scrape every hour, 6 hours, day scrape highscores to get the xp gains over time.

a hypothesis is that bots gain xp at a similar rate, in a specific skill compared to normal people gaining xp in many skills

extreme4all commented 3 years ago

also gathering labeled data will make it way easier :D

Ferrariic commented 3 years ago

i don't know how effective the raw players stats are, for detecting bots. Some data engineering, we can scrape every hour, 6 hours, day scrape highscores to get the xp gains over time.

a hypothesis is that bots gain xp at a similar rate, in a specific skill compared to normal people gaining xp in many skills

Yeah I would love to scrape the hiscores every 6 hrs, unfortunately there is a rate limit of 2-3 seconds per name. So 100K names could take 69 hrs to scrape.

Ferrariic commented 3 years ago

also gathering labeled data will make it way easier :D

It would be, but we don't know the labels unfortunately, since we don't know/can't easily trust the accuracy of sent in labels for individual players. So kmeans can group players on their stats and output labels for us. Those labels then go into the KNN classifier Which seems to work well so far at least.

Ferrariic commented 3 years ago

It's a very tough situation due to the API ratelimit.

extreme4all commented 3 years ago

i have some experience with the API limits :), i used to scrape the entire osrs ge. but first things first, some refactoring, a database would be beneficial, what data are we getting from the plugin. it would be nice if we had the following information from the plugin:

i suggest 2 endpoints. Report_player & report_players. both endpoints do inserts in the database, table:

the difference between report_player & report players is that we would set a column in player_reports as Nearby_players, True (1).

in the players table we keep track when a player is created, banned, banned_date. ban is detected if a player is removed from high scores.

For highscores we need some tables Table: Highscores Columns:

Table: Highscores_latest (don't know if needed) Columns:

We would need routes to request data from the database. i suggest:

Can you set up the database side, i can setup the Flask api that will run on the server. what i have described should be the basis for a nice website that can display our best bot detector :D. additionally it should be the basis for our AI idea's.

AI workflow will be the following:

  1. Request data
  2. pre processing (Data cleaning & Feature engineering )
  3. ai modelling
  4. model evaluation
  5. model deployment
extreme4all commented 3 years ago

it might also be useful to have the plugin push a user token, so we can stop abuse?

Ferrariic commented 3 years ago

Excellent suggestions. I will work this week and weekend to make the code much easier to read. We will also try to make it so that the reporting player's info will be included, as well as report the location of the found players. I can definitely set up the database side and properly reconfigure everything so that it is very clear and manageable. I have also recently set up a flask app on a Linode server w/ gunicorn and nginx as a test for a switch from Google cloud app ==> Linode. - however my flask app is very rudimentary so changes are highly appreciated. I will let you know once the changes have been made, and when a database will become available - this will definitely assist in improving the workflow from this point onward.

As for the data we are getting from the plugin: Simply player names are being given at this time. Those names are then processed on our end to retrieve the OSRS Hiscore data values. Location and the reporting player were planned to be included in later updates, but we can shift the schedule to include these values earlier on.

extreme4all commented 3 years ago

the data is in json format?

for an minimum viable product the Location and the reporting player would be really good, combined with a website. It gives people something to show, with a bit of luck sir pugger will pick it up :D.

(recently i've got myself a vps for my tools aswell, but i'm not experienced in any of that linux stuf :p )

maybe send me a message on twitter, so we can share a .env file, @3xtreme4all

Ferrariic commented 3 years ago

Haha no. Embarrassingly, the data is in a text file format. I'm going to try and convert it all into a json format from now on. Also I'd be super excited to have a great looking website where you can look up statistics/etc. That would really be remarkable to add in the future!

Also don't worry - I don't know anything regarding linux. I just followed a tutorial on youtube (As with basically how I've done everything that I've done so far, youtube is the way to go)