commonsearch / cosr-back

Backend of Common Search. Analyses webpages and sends them to the index.
https://about.commonsearch.org
Apache License 2.0
122 stars 24 forks source link

Add Stack Overflow document source #57

Open sylvinus opened 8 years ago

sylvinus commented 8 years ago

Dumps seem to be available at https://archive.org/details/stackexchange

wumpus commented 7 years ago

This is a good idea, you can do a much better job indexing Stack Exchange sites from the data dump. As an example, the tags (like "Python", "Ruby" etc) are only sometimes in the question title, but people searching stack overflow frequently put a language in their query. Yeah, you can find the tags in the html somewhere, but it's probably easier to use the data dumps directly.