LuoUndergradXJTU / TwiBot-22

Offical repository of TwiBot-22 @ NeurIPS 2022, Datasets and Benchmarks Track.
MIT License
153 stars 43 forks source link

Low Performance on Most of the Baseline Algorithms #28

Closed msharara1998 closed 1 year ago

msharara1998 commented 1 year ago

Hello, Thank you for your hard efforts in making such a dataset. I noticed that the performance for most of the baseline algorithms applied on Twibot-22 is very low. Precisely, for the F1 score since it is not a balanced dataset. At the same time, the same algorithms achieve much higher F1 scores in Twibot-20 and other benchmarks. Is this supposed to be a problem in the dataset itself? What explains this low performance? Thanks in advance

BunsenFeng commented 1 year ago

I agree that this is mostly caused by class imbalance, which is also the case in real-world social media (genuine users >> bots). Maybe a) include ML techniques that combat class imbalance in model training or b) create a subset of TwiBot-22 that is more balanced?

msharara1998 commented 1 year ago

I'll consider these approaches, thanks!