LaihoE / demoparser

Counter-Strike 2 replay parser for Python and JavaScript
MIT License
287 stars 30 forks source link

Question related to your "CSGO cheating dataset" on Kaggle #220

Open MehnaazAsad opened 2 days ago

MehnaazAsad commented 2 days ago

Hi @LaihoE , I came across your dataset on Kaggle (https://www.kaggle.com/datasets/emstatsl/csgo-cheating-dataset/data?selectedOnly=true) and wanted to know if this parser was used to collect that data? I noticed that the parser can output a host of helpful fields and since the Kaggle dataset is a minimal version of this, I was curious as to 1) how it was curated and 2) how the labeling of "cheater" and "legit" players was performed?

I have been using that dataset for my own side project (a comparative analysis between manual coding and an AI assisted approach) and I am working on a short blog post which is why I wanted to know more about how you built that dataset.

Thanks, Mehnaaz

LaihoE commented 1 day ago

Hi @MehnaazAsad! Markus parser was used to collect the data https://github.com/markus-wa/demoinfocs-golang. This library only works for CS2.

As for the labeling, websites like faceit used to have a list of banned players and then the most recent game of said player would be labeled as "cheating" data. This would mean that some of the data may not have been labeled correctly. As for the "legit" players I dont think anything special was done, just anyone who didnt have a ban was used.

Despite the data possibly having some incorrectly labeled data, I got a model made https://github.com/LaihoE/DLAC. I dont have any numbers on how well it worked as it was also hard to even know if a false positive really was a false positive or if a "legit" player really was a cheater. That being said I got the false positive rate very low and the model could have been used as a filter.

This parser exposes a bunch of new stuff that could be interesting. Things like aim punch angles, button presses, local viewangles and so on could be added among many more.