daremon / urlclustering

Package to facilitate URL clustering
MIT License
68 stars 27 forks source link

select the best leaf #8

Open noanti opened 7 years ago

noanti commented 7 years ago

The formula used for selecting the best leaf is len(x['urls']) * (max_reductions - x['reductions']) ** 2 It seems work well, but I can't understand why. Is it the best formula? How to proof? And how did you find this formula?

thank you :)

hzhaop commented 7 years ago

Same question, just curious about how the formula comes.