Jeanselme / Search_Engine

A search engine through different types of document.
GNU General Public License v3.0
0 stars 0 forks source link

Index #10

Open Jeanselme opened 7 years ago

Jeanselme commented 7 years ago

As than ReverseIndex, it would be interesting to save the name of the file at the beginning of the index, in order to have a full separation between creation and reading of the index. It would allow to index and create a reverse index latter. Because currently, both phases have to follow each other. Moreover, we could separate objects for reading and writing an index ?

clinm commented 7 years ago

I agree that we do need this kind of information in order to separate the process. But I am not sure that adding the filename at the beginning of the index is a good idea. Maybe in future enhancements we would want to add more and more information about one document (file extension, indexation date, person who added the document, etc).

I think an alternate way could be great. Maybe a .info or something like that stored as JSON (human readable). This file could expand to our need and even stay after the indexation process (in order to retrieve and display additional information when using the research part.

Jeanselme commented 7 years ago

It would be interesting to call the index of the file by the hash of its text content, in order to avoid indexing two files which have the same content. Moreover, this way, we correct the following issue : if we give two files which have the same name, that would create conflict. The information in order to have the correspondence between the hash and the file name would be in the .info file.