commonsearch / cosr-back

Backend of Common Search. Analyses webpages and sends them to the index.
https://about.commonsearch.org
Apache License 2.0
122 stars 24 forks source link

Index license info #30

Open sylvinus opened 8 years ago

sylvinus commented 8 years ago

It's hard to believe that with @mlinksva in the loop this hasn't been proposed before ;-)

How important/useful would it be to index Creative Commons (and others?) license tags and be able to filter results depending on them?

mlinksva commented 8 years ago

I've not found license filtered web (ie text) search particularly useful as a user. Other media types yes. But if indexing is cheap, worth experimenting with.

As someone curious about patterns and changes in licensing, the statistics from such an index are on the other hand very interesting. I imagine Common Search could publish super interesting index statistics data; if licensing stats were included all the better.

sylvinus commented 8 years ago

Ok, makes sense. There seems to be some interesting work done at https://github.com/dkpro/dkpro-c4corpus

indrajithi commented 8 years ago

Including license filter is a good idea. It will also be nice if you can filter results based on the file type like (pdf, ppt, xls or even mp3). This is included in google advance search.