Closed flavioamieiro closed 9 years ago
@flavioamieiro, I don't see a problem with the index belonging to a user, since API clients will necessarily have to have some way to manage their indices. in any case, we should separate the indexing pipeline from the standard pypln pipeline, which I guess is already contmplated, and we should not persist the indexed documents in mongofiles, there must be a trigger to delete them noce they have been indexed. The elasticsearch index is already a persistence mechanism. No need to duplicate storage.
@fccoelho I agree that the index belonging to the user is a good idea, but I'm only not sure my approach is the best one. I preferred to take this route instead of making the index name unique because that would mean the user would basically need to find a index name that wasn't taken by guessing. The separation between indexing and standard pipelines is already in place, yes. So when a user sends a document through this indexing endpoint, it will not trigger the standard pipeline, only the indexing one. As for the deletion, I will write a worker that deletes the file from GridFS to be run after the ElasticIndexer worker.
Then I have no further objections. if @israelst is also ok with this PR, I think we can merge.
I've reported an issue (#128) while testing this PR, but it is not exactly related to the PR. So, I think this is worth to merge.
This, together with NAMD/pypln.backend#124 provide a way to index documents in elasticsearch and then query them.
I decided on an approach to make sure the index belongs to the user that I'm not very comfortable with yet, so I'd love input on this.
This Pull Request needs a review, so please @fccoelho and @israelst, I'd love it if you can take a good look at the code.