feliperazeek / playframework-elasticsearch

Integrate Elastic Search in a Play! Framework Application. This module uses JPA events to notify Elastic Search of events of their own. It embeds a running Elastic Search instance for Rapid Development.
http://geeks.aretotally.in/play-framework-module-elastic-search-distributed-searching-with-json-http-rest-or-java
84 stars 43 forks source link

Reindexing is needed #35

Open vkolotov opened 12 years ago

vkolotov commented 12 years ago

We are trying to use elastic search and, as far as we can see, elastic search module doesn't have any feature to reindex all entities in DB. I guess it is quite usefull feature, especially when you are migrating from one to another DB or when you adding entities via SQL script.

It is quite simple and straightforward feature which can be implemented as a method in an admin controller.

bgooren commented 12 years ago

Is is already possible in the current release, but by using undocumented methods. We can add a documented way to do this.

vkolotov commented 12 years ago

Ok great! Can you tell us how to do it with current release? BTW, we already implemented this feature, but we still need "official" version. Also, we can share implementation with you, if you want.

bgooren commented 12 years ago

We stop play, remove the index files and then for every indexed class do it like this:

private static <T extends Model> void updateIndex(Class<T> clazz) {
    // Update index
    ElasticSearchPlugin plugin = Play.plugin(ElasticSearchPlugin.class);
    ModelMapper<T> mapper = MapperFactory.getMapper(clazz);
    ElasticSearchAdapter.startIndex(plugin.client(), mapper);

    List<T> objs = JPQL.instance.findAll(clazz.getName());
    for (T obj : objs) {
        try {
            ElasticSearchAdapter.indexModel(plugin.client(), mapper, obj);
        } catch (Exception e) {
            e.printStackTrace();
        }
    }
}

As you can see, all that is needed to allow a user to index a model is a wrapper method for

ElasticSearchAdapter.indexModel(plugin.client(), mapper, obj);
bgooren commented 12 years ago

Haha, seems like I already added that wrapper method. See

ElasticSearch.index()

So what you can do is remove your elastic search index files when play is stopped, start play and feed all the models you want to index to ElasticSearch.index().

vkolotov commented 12 years ago

Great! Thank you!

But, I just want to say that it is better to implement paging while iterating entities, since the amount of entities might be too big. It might cause OutOfMemoryError.

Anyway, this feature might be implemented in admin console of ES module like how it is done in "search module". Is it supposed to be?

bgooren commented 12 years ago

Sure, paging is important. But that is for the user to handle I'd say right now.

Of course it's something which could be added to the admin.

aheritier commented 12 years ago

I don't know if it is due to the lack of paging or to something else but I have a strange behavior when I try to create/update the index from data in my DB. I'm using the embedded ES and try to index a category of objects (~200 thus not so much) and each time I index them I don't have all objects. The first time it indexes only objects > 100 in the list and if I recall the update it adds few missing objects each time. It is like the loop of objects were too quick to let ES index them. The problem is that if I use a debugger to try to analyze the issue, the problem disappears ...

aheritier commented 12 years ago

Just adding a "Thread.sleep(5);" in my loop solves the issue :(

aheritier commented 12 years ago

Instead of the sleep, adding a request to refresh the index solves also my issue : ElasticSearch.client().index(Requests.indexRequest().refresh(true));

bgooren commented 12 years ago

Hmmm, sounds like this is more of an elasticsearch issue. If you see everything after you ask ES to refresh its index, it means all updates did arrive, but aren't visible right away.

I haven't encountered such issues myself, but feel free to submit a patch if you find a good solution for this issue.

isamaru commented 12 years ago

I have found the root cause for the issue. It is an issue of play framework: indexing tasks use play.libs.F.EventStream queue, which has a bounded buffer size (100) and discards the event with no warning on overflow.

Would be nice to have a "drop all index feature", and it would be even nicer if it worked automatically with Fixtures.deleteDatabase() and the wipe before entering initial data.

bgooren commented 12 years ago

Ah, that sounds like the root cause indeed! I guess the module needs a more reliable way of handling the index events.

Regarding fixtures: do you know if we can hook into an event or plug a listener somewhere to handle fixture cleanup?

isamaru commented 12 years ago

I solved it by extending play.test.Fixtures and including my own implementation in all code:

/**
 * Extension of Fixtures to mass delete and create search indices
 */
public class Fixtures extends play.test.Fixtures {

    /**
     * Flush the entire JDBC database and clear the indices
     */
    public static void deleteDatabase() {
        deleteIndices();
        play.test.Fixtures.deleteDatabase();
    }

    /**
     * Delete all Model instances for the all available types using the underlying persistence mechanisms, clear the indices
     */
    public static void deleteAllModels() {
        deleteIndices();
        play.test.Fixtures.deleteAllModels();
    }

    /**
     * Load Model instances from a YAML file and persist them using the underlying persistence mechanism. The format of the YAML file is constrained, see the Fixtures manual page Search indices are created synchronously.
     * 
     * @param name
     *            Name of a YAML file somewhere in the classpath (or conf/)
     */
    public static void loadModels(final String name) {
        ElasticSearchPlugin.setBlockEvents(true);
        play.test.Fixtures.loadModels(name);
        ElasticSearchPlugin.setBlockEvents(false);
        ElasticSearchPlugin.batchProcessBlockedOperations();
    }

    /**
     * @see loadModels(String name)
     */
    public static void loadModels(final String... names) {
        for (final String name : names) {
            loadModels(name);
        }
    }

    private static void deleteIndices() {
        final Client client = ElasticSearchPlugin.client();
        client.prepareDeleteByQuery("_all").setQuery(QueryBuilders.matchAllQuery()).execute().actionGet();
        Logger.info("All search indices deleted!");
    }

}