VIDA-NYU / domain_discovery_tool_deprecated

Seed acquisition tool to bootstrap focused crawlers
23 stars 8 forks source link

Add some statistics of the selected corpus #12

Open yamsgithub opened 9 years ago

yamsgithub commented 9 years ago

On the menu add a tab to view the statistics of the data for a selected domain using bokeh. These could be:

  1. Display a summary of queries thus far
  2. The domains that were crawled
  3. Some statistics like the number of pages/per query, pages/per keyword.
  4. Page statistics and queries by time
brittainhard commented 8 years ago

@yamsgithub am I to assume that I have to parse an elasticsearch response for this info?

yamsgithub commented 8 years ago

You need to parse the results returned from the server which in turn will parse the response from elasticsearch through the elastic library.

brittainhard commented 8 years ago

@yamsgithub what is meant by "queries" here? Are those the queries for the web search? Or maybe for the filters? Where do I get this data?