NeowayLabs / neosearch

Full Text Search Library
30 stars 4 forks source link

[DESIGN] Definition of neosearch concepts #36

Open i4ki opened 8 years ago

i4ki commented 8 years ago

This is a very important topic that @ebellani reminded me. As the project is in the beginning, I think that this is the time for make clear some words used in the project. I propose we create a page with NeoSearch definitions when done.

To be or not to be (x2)

At first, what's Neosearch? Really, I don't know yet.

Then, what's neosearch responsability? Search is an obvious requirement, but relationship operations (JOIN) and aggregation/summaries is something that makes sense for a search engine? I'm thinking now that neosearch will be much more closer to a database with extended search capabilities than a search engine with lots of database concepts inside (like SOLR and ES).

Keywords

The definitions of index and database in Neosearch are very confused. For example, the code below:

index, err := neosearch.CreateIndex("companies")

Will create a new directory inside the DATADIR directory to store the reverse index for each column of company's documents. But "companies" isn't a reverse index, companies is only the storage location for a bunch of reverse-index files (name.idx, status.idx, socios.idx, and so on) plus the document.db (raw data in a hash map). Then, internally, this is called database (yes, a strong word) and the index files are called indices. This is obvious when using the neosearch-cli:

ns> USING <DATABASE>.<INDEX> GET "something";
OR
ns> USING companies.name.idx GET "neoway";

Today, the document.db store the raw unstructured document. Is impossible to get any inference in the document.db, but we can store the data in structured form in the future for this. Then, teorically, what we call database is only a collection of indices and the documents data...

ElasticSearch call "index" and SolrCloud call "collection" what we named "database". Now I think that "collection" maybe a better option for us. ElasticSearch and SolrCloud have an aditional concept called "type" (I guess that "core" in Solr). I think that this is due to Lucene limitations of JOIN and other relationship operations only be possible inside the logical index. Neosearch at the moment doesn't need anything like that. The leading search engines doesn't agree in definitions. We will follow the same path?

ebellani commented 8 years ago

A way to achieve what you're trying to do here is to try to answer the why and what questions for each of the main structural concepts. This has the potential to discover new concepts in turn.

ebellani commented 8 years ago

The fact that the available products don't agree between themselves is a problem, but the major red flag is that they don't agree internally. They are conceptually incoherent, and, as such, unstable and bound to fail, either in the short or the long run. Plus, much of the code generated by these products is an artifact of the structural conceptual incoherence.