Open i4ki opened 8 years ago
A way to achieve what you're trying to do here is to try to answer the why and what questions for each of the main structural concepts. This has the potential to discover new concepts in turn.
The fact that the available products don't agree between themselves is a problem, but the major red flag is that they don't agree internally. They are conceptually incoherent, and, as such, unstable and bound to fail, either in the short or the long run. Plus, much of the code generated by these products is an artifact of the structural conceptual incoherence.
This is a very important topic that @ebellani reminded me. As the project is in the beginning, I think that this is the time for make clear some words used in the project. I propose we create a page with NeoSearch definitions when done.
To be or not to be (x2)
At first, what's Neosearch? Really, I don't know yet.
Then, what's neosearch responsability? Search is an obvious requirement, but relationship operations (JOIN) and aggregation/summaries is something that makes sense for a search engine? I'm thinking now that neosearch will be much more closer to a database with extended search capabilities than a search engine with lots of database concepts inside (like SOLR and ES).
Keywords
The definitions of index and database in Neosearch are very confused. For example, the code below:
Will create a new directory inside the
DATADIR
directory to store the reverse index for each column of company's documents. But "companies" isn't a reverse index, companies is only the storage location for a bunch of reverse-index files (name.idx, status.idx, socios.idx, and so on) plus the document.db (raw data in a hash map). Then, internally, this is calleddatabase
(yes, a strong word) and the index files are calledindices
. This is obvious when using theneosearch-cli
:Today, the document.db store the raw unstructured document. Is impossible to get any inference in the document.db, but we can store the data in structured form in the future for this. Then, teorically, what we call database is only a collection of indices and the documents data...
ElasticSearch call "index" and SolrCloud call "collection" what we named "database". Now I think that "collection" maybe a better option for us. ElasticSearch and SolrCloud have an aditional concept called "type" (I guess that "core" in Solr). I think that this is due to Lucene limitations of JOIN and other relationship operations only be possible inside the logical index. Neosearch at the moment doesn't need anything like that. The leading search engines doesn't agree in definitions. We will follow the same path?