Trogluddite / loombreaker

Tools for building Topic-Specific Web Indexes (CS-480 Capstone)
MIT License
0 stars 0 forks source link

add new discord commands to manage crawling/querying semantics #58

Open Trogluddite opened 5 months ago

Trogluddite commented 5 months ago

stub out new discord commands. The discord bot should allow these commands to run, but should just echo back "ok" or something; they don't need to do real work yet.

The idea here is to develop a user-flow (UX) for the crawler, querier, and markov selection process.

Crawling the web and building bayesian networks will be batch tasks. We want the discord bot to issue commands to trigger those processes, and wait for 'done' responses, which will be returned to Discord.

'searching' the Bayesian network is really going to be generating candidate texts, comparing them to a target, and choosing the best-match from the candidates as a result.

I'm envisioning a workflow like this:

  1. user searches the bayesian network. /search command, with some 'target_text' and 'show_sources' bool. This, as far as the UX is concerned, operate the same as it currently does.
  2. User adjusts crawler initial parameters. 'add_seed' / 'remove_seed' / 'show_seeds' type commands.
  3. User triggers a fresh crawl. 'start_crawl'. Since this is an expensive (in terms of time) operation, we may want some controls around this; I don't know what that looks like yet.
  4. crawl notifies slack (perhaps on a status channel?) that it's complete.
  5. User adjusts query parameters. something like /query with filters and query_params values.
  6. User builds a new bayesian network instance, like /reload_from_query or the like
  7. we repeat from 1, continuously. Over time, we train the search engine to index texts that we like.
Trogluddite commented 5 months ago

some brainstorming from Discord -- we'll eventually need something like this in the API: crawler interface

read seeds 
remove seed 
add seed
start crawl
check crawl status

query interfaces

set query (set filters, query strings)
do query
build new MatrixMarkov instance (either add a new instance, or replace current)
list matrix markov instances (if we handle multiple concurrent networks)
delete matrix markov instances (if we handle multiples)