dagster-io / fake-star-detector

https://github.com/dagster-io/dagster
234 stars 19 forks source link

random sampling + accounting for P/R in final numbers #6

Open soodoku opened 1 year ago

soodoku commented 1 year ago

Dear All,

Loved the work!

Two small potential improvements:

  1. "When we tested this heuristic on the known fake stars in our dummy account, we found that while it could be very computationally expensive" --- one way out of it is to use random sampling and bound the percentage of fake
  2. "it was both very good at detecting fake accounts and also extremely accurate (98% precision and 85% recall)" --- the final numbers don't account for P/R. Here's what I mean: http://gojiberries.io/2021/05/30/best-guess-of-true-proportion-of-1s/