Closed freekh closed 10 years ago
Yeah, I think lucene could do the job. I am not sure if there are any alternatives that makes sense? In any case, it would be cool to try Lucene out and see how the code would look like and how well it would perform.
I was thinking it would be cool if the first time you do a search it would search and index or index first then search.
Then after that it would be fast, but you wouldn't waste time on indexing if there is no need for searches.
Yeah make sense. Though i did a bit of searching around and i didn't find how one would index json files with lucine (afaik it treats everything as text file) so maybe it would be easier to make our own index consisting of simply a list of existing coords.
On Mon, Aug 19, 2013 at 3:27 PM, Fredrik Ekholdt notifications@github.comwrote:
Yeah, I think lucene could do the job. I am not sure if there are any alternatives that makes sense? In any case, it would be cool to try Lucene out and see how the code would look like and how well it would perform.
I was thinking it would be cool if the first time you do a search it would search and index or index first then search.
Then after that it would be fast, but you wouldn't waste time on indexing if there is no need for searches.
— Reply to this email directly or view it on GitHubhttps://github.com/adept-dm/adept/issues/4#issuecomment-22871384 .
It has been ages since I looked at Lucene, but there must be some way of index any structured document I would assume. This page seems a bit complicated: http://lucene.apache.org/core/3_0_3/fileformats.html#Fields but the way I read it is that you can define a document stucture (a module in our case) and add searchable fields to it. I am not sure though - would be nice to look at a tutorial.
As you say, our own index might be just as good and maybe Lucene is overkill. If there was a library that had a B+ tree implementation that persisted stuff (or maybe use lucene's implementation) it would be nice though. With such an index we would be able to scale better and we could index other stuff as well (descriptions, universes, ...)
Or perhaps I have been thinking too much into it. Perhaps what we need now is to iterate through all the module files and get results from a reg exp. Most people have SSDs which should do this pretty quickly and currently there are not that many modules. It would be fast to implement and easy to replace with something more efficient once the time is right. WDYT?
I would use Lucene. It is a very clean library, with tons of traction, and heavily optimized. You can define whatever searchable fields you want. You may have to write explicit code to go from JSON to a Lucene document, or you may find a library that does it, but either way it is simple.
Dean
From: Fredrik Ekholdt notifications@github.com Reply-To: adept-dm/adept <reply+i-17501929-199675c68aa765e3e049617009238dee3e55208f-237535@reply.gith ub.com> Date: Monday, August 19, 2013 1:24 PM To: adept-dm/adept adept@noreply.github.com Subject: Re: [adept] Add search capability to Adept (#4)
It has been ages since I looked at Lucene, but there must be some way of index any structured document I would assume. This page seems a bit complicated: http://lucene.apache.org/core/3_0_3/fileformats.html#Fields but the way I read it is that you can define a document stucture (a module in our case) and add searchable fields to it. I am not sure though - would be nice to look at a tutorial.
As you say, our own index might be just as good and maybe Lucene is overkill. If there was a library that had a B+ tree implementation that persisted stuff (or maybe use lucene's implementation) it would be nice though. That way we could maintains perf.
Or perhaps I have been thinking too much into it. Perhaps what we need now is to iterate through all the module files and get results from a reg exp. Most people have SSDs and currently there are not that many modules. It would be fast to implement and easy to replace with something more efficient. WDYT?
‹ Reply to this email directly or view it on GitHub https://github.com/adept-dm/adept/issues/4#issuecomment-22901273 .
Yeah i have no experience with lucine so i don't know. We can probably try both if we want to. Using lucine looks simple enough, though ( http://www.lucenetutorial.com/lucene-in-5-minutes.html). Also, thanks to storing all the metadata in the git, reindexing should be really simple and quite fast (we have all the info about which files changed for free)
On Mon, Aug 19, 2013 at 10:31 PM, Dean Thompson notifications@github.comwrote:
I would use Lucene. It is a very clean library, with tons of traction, and heavily optimized. You can define whatever searchable fields you want. You may have to write explicit code to go from JSON to a Lucene document, or you may find a library that does it, but either way it is simple.
Dean
From: Fredrik Ekholdt notifications@github.com Reply-To: adept-dm/adept
<reply+i-17501929-199675c68aa765e3e049617009238dee3e55208f-237535@reply.gith ub.com> Date: Monday, August 19, 2013 1:24 PM To: adept-dm/adept adept@noreply.github.com Subject: Re: [adept] Add search capability to Adept (#4)
It has been ages since I looked at Lucene, but there must be some way of index any structured document I would assume. This page seems a bit complicated: http://lucene.apache.org/core/3_0_3/fileformats.html#Fieldsbut the way I read it is that you can define a document stucture (a module in our case) and add searchable fields to it. I am not sure though - would be nice to look at a tutorial.
As you say, our own index might be just as good and maybe Lucene is overkill. If there was a library that had a B+ tree implementation that persisted stuff (or maybe use lucene's implementation) it would be nice though. That way we could maintains perf.
Or perhaps I have been thinking too much into it. Perhaps what we need now is to iterate through all the module files and get results from a reg exp. Most people have SSDs and currently there are not that many modules. It would be fast to implement and easy to replace with something more efficient. WDYT?
‹ Reply to this email directly or view it on GitHub https://github.com/adept-dm/adept/issues/4#issuecomment-22901273 .
— Reply to this email directly or view it on GitHubhttps://github.com/adept-dm/adept/issues/4#issuecomment-22901725 .
Currently it is not possible to search adept for coordinates.
A regex based search would be great where you could do something like: adept search akka and get all coordinates for akka would be great