Feature/elastic indexer

NAMD / pypln.web

Web interface for PyPLN

http://pypln.org/

GNU General Public License v3.0

12 stars 8 forks source link

Feature/elastic indexer #127

Closed flavioamieiro closed 9 years ago

flavioamieiro commented 9 years ago

This, together with NAMD/pypln.backend#124 provide a way to index documents in elasticsearch and then query them.

I decided on an approach to make sure the index belongs to the user that I'm not very comfortable with yet, so I'd love input on this.

This Pull Request needs a review, so please @fccoelho and @israelst, I'd love it if you can take a good look at the code.

fccoelho commented 9 years ago

@flavioamieiro, I don't see a problem with the index belonging to a user, since API clients will necessarily have to have some way to manage their indices. in any case, we should separate the indexing pipeline from the standard pypln pipeline, which I guess is already contmplated, and we should not persist the indexed documents in mongofiles, there must be a trigger to delete them noce they have been indexed. The elasticsearch index is already a persistence mechanism. No need to duplicate storage.

flavioamieiro commented 9 years ago

@fccoelho I agree that the index belonging to the user is a good idea, but I'm only not sure my approach is the best one. I preferred to take this route instead of making the index name unique because that would mean the user would basically need to find a index name that wasn't taken by guessing. The separation between indexing and standard pipelines is already in place, yes. So when a user sends a document through this indexing endpoint, it will not trigger the standard pipeline, only the indexing one. As for the deletion, I will write a worker that deletes the file from GridFS to be run after the ElasticIndexer worker.

fccoelho commented 9 years ago

Then I have no further objections. if @israelst is also ok with this PR, I think we can merge.

israelst commented 9 years ago

I've reported an issue (#128) while testing this PR, but it is not exactly related to the PR. So, I think this is worth to merge.