VIDA-NYU / domain_discovery_tool_deprecated

Seed acquisition tool to bootstrap focused crawlers
23 stars 8 forks source link

Return a more diverse set of pages #98

Open kienpt opened 7 years ago

kienpt commented 7 years ago

Given the fact that users can't annotate all the available pages, DDT should provide a way for users to select a set of diverse pages (i.e., maximising the sum of pairwise similarity of all pages in the set) for annotation. Current version allows users to select pages via visualisation which groups similar pages together, therefore obtaining pages that are textually different requires multiple selections.