MichaelAquilina / Reddit-Recommender-Bot

Indentifying Interesting Documents for Reddit using Recommender Techniques
7 stars 0 forks source link

Current word_concepts really easily confuses programming languages #93

Closed MichaelAquilina closed 10 years ago

MichaelAquilina commented 10 years ago

This is probably due to syntax being parsed which is identifiable as code which could constitute any language. Need some mechanisms to prevent this.

MichaelAquilina commented 10 years ago

This is a second_order_ranking problem which occurs because alot of pages can "abuse" of the fact they are heavily linked to. Need some form of better normalisation procedure to solve this. Some graph theory might help.

MichaelAquilina commented 10 years ago

The development branch "improved-second-ranking" partially resolves this. Small articles still seem suffer from this however.