Open jackrea07 opened 8 months ago
Thanks! As a general rule, I would prefer if the readme only contained informations that are specific to sherlodoc rather than general knowledge that can be acquired from the internet :) (otherwise it becomes harder to know where to stop in what needs to be explained)
Regarding the contributor guide, I'm afraid we don't do anything special that requires instructions (and sorry we don't plan to have a dev branch atm).
However I agree that our use of suffix trees could be expanded upon since we do stuff that are different from the wikipedia description:
$string_loc
... our suffix trees are dictionaries/maps with string keys). Those arrays of results are sorted by the static evaluation described below in the readme, to ensure that "better" results are at the beginning of the arrays (such that we don't need to traverse the entire arrays to discover them!)O(alphabet)
but the alphabet is very small), but it enables treating each subtree as a purely functional priority queue: This improves the enumeration of matching results in the static ordering. Finding the top N results for a query requires to compute the (on demand) intersection of the array leaves for each word in the query (and there can be many many leaves if a query word is short, so ordering each subtree makes the intersection algorithm much faster as we can avoid exploring subtrees that have the worst candidates)Base
with this optim). In other words, the suffix trees are compressed into a finite state machine / automata (a directed acyclic graph (DAG)). This compression doesn't impact the rest of the code since we can traverse the DAG just as if it was a tree.It might be interesting to use graphviz on a small ocaml library to show what the indexes ends up looking like! (to make the suffix tree illustration more spicy than banana
^^)
I like the contribution recommendation, I think they are going to be useful in the future, but they should be in a CONTRIBUTING.md file (that can be referenced in the readme).
The recommendations are general, but I would like to expand on with more project-specific things once it is merged.
Maybe something similar for the suffix-tree explanation would make @art-w happy ? Maybe a DESIGN.md file ? I am not sure about the name nor the place were it should be, but I personally think it is a good thing, especially for explanation that are specific to the project. More general explanation I am not against, but they should be short and link to external resources.
It might be interesting to use graphviz on a small ocaml library to show what the indexes ends up looking like! (to make the suffix tree illustration more spicy than banana ^^)
Since this is a tree, an ascii representation might be enough. This could be also "expect-tested" which would be quite nice in my opinion. I'll open an issue for this.
Regarding the contributor guide, I'm afraid we don't do anything special that requires instructions (and sorry we don't plan to have a dev branch atm).
I want a contributor guide because there might be outreachy interns working on this. Other than that, we do not plan on having a dev branch, this is not common practice in the ocaml world, so I think this part should be removed.