digiaonline / lumen-elasticsearch

Simple wrapper of https://github.com/elastic/elasticsearch-php for the Lumen PHP framework.
MIT License
57 stars 24 forks source link

Documentation needed #9

Closed pelmered closed 7 years ago

pelmered commented 8 years ago

Thank you for this module, it looks great!

However, it really lacks documentation. You do not get far with the information in the readme.

I just started to use this, and the things I would like to know now to get started is:

How should I use the commands? They are not available after loading the Service Provider. I tried to copy them from src/Console into my app/Console/Commands folder and that works. Now when I run the create index command, it seams like it needs a configuration file. There's not even a mention about that in the readme. The only config file that is mentioned seams to missing all the configuration values needed for indexing.

Could you please provide a better step by step guide for getting started with this? Or at least specify the what configuration file I should provide to the create the index.

Thanks in advance!

pelmered commented 8 years ago

For the configuration, I guess it's elasticsearch-php and elasticsearch documentation I should look at.

I think I can figure this out, but could you give me an example? Preferably with migrations so I can se the DB schema.

crisu83 commented 8 years ago

Here is a sample index configuration:

<?php
return [
    'index' => 'my-index',
    'body' => [
        'mappings' => [
            'my-model' => [
                'properties' => [
                    'id' => ['type' => 'string', 'index' => 'not_analyzed'],
                    'name' => ['type' => 'string'],
                ],
            ],
        ],
        'settings' => [
            'analysis' => [
                'filter' => [
                    'finnish_stop' => [
                        'type' => 'stop',
                        'stopwords' => '_finnish_',
                    ],
                    'finnish_stemmer' => [
                        'type' => 'stemmer',
                        'language' => 'finnish',
                    ],
                ],
                'analyzer' => [
                    'finnish' => [
                        'tokenizer' => 'standard',
                        'filter' => [
                            'lowercase',
                            'finnish_stop',
                            'finnish_stemmer',
                        ],
                    ],
                ],
            ],
        ],
    ],
];

app/database/elasticsearch/my-index.php

As you can see it's all Elasticsearch configuration, please refer to the official documentation for more information.

Also, to run the console command simply add the commands to your Kernel (app/Console/Kernel.php) command map.

pelmered commented 8 years ago

Thank you @crisu83! I will try to learn the Elasticsearch configuration format from the Elasticsearch documentation.

Yes, and regarding the commands I thought about adding the commands directly without copying them. This worked fine:

    protected $commands = [
        \Nord\Lumen\Elasticsearch\Console\CreateCommand::class,
        \Nord\Lumen\Elasticsearch\Console\DeleteCommand::class
    ];

I do think that both issues here should be clarified in the readme for the next person. I could write this and make a PR if you want.

Another somewhat unrelated question. Will this module update the indexes when a new database record is saved (using Lumen models) automatically? If not, how do recommend that this should be handled?

kryysler commented 8 years ago

We really need to improve the documentation, so if you could submit a PR it would be awesome!

As for the index updates, this module only provides the tools for doing the job, so you need to do the index updates yourself. I would create a service class in your app, that handles modifying the index (add/update/delete). Something like:

$this->getMyIndexingService()->indexMyModel($myModel);

That service can the call the helper methods here https://github.com/nordsoftware/lumen-elasticsearch/blob/develop/src/ElasticsearchService.php#L43 to preform the operations in elasticsearch. The elasticsearch documentation provides good enough guides on what parameters are needed etc. https://www.elastic.co/guide/en/elasticsearch/guide/current/index-doc.html

pelmered commented 8 years ago

I've been looking some more into this now, and I'm starting to get a grip on how this works.

For updating the indexes I think I'm going for a solution with Event Listeners similar to this: http://stackoverflow.com/a/30540207/951744 What do you think about that?

The only thing I don't really understand now is what I should do with the Index command. I have created the index and I can add data, but I'm not sure how I should use your Index command for bulk adding. Why is the index command class abstract? How should I implement that?

I have started to write some documentation to set up all of this. Should I write this on a wiki page or add it into the readme? I'm doing this as a hobby project and I don't have so much time, but I should probably be done with this within 2-3 weeks.

hugovk commented 8 years ago

I think its best put into a .md file rather than a wiki, that way the info stays with the repo if it's forked or moved to other hosts.

Depending on the size/scope, either put it in README.md, or in a new .md file.

pelmered commented 8 years ago

@hugovk: Sure, sounds good.

Could you give me some guidance about the Index command I asked about? Otherwise I could just make my own command that loops though the models and pushes it in one by one.

pelmered commented 8 years ago

@crisu83 or @kryysler, could you give me some guidance on how to implement the index command class to map my models?

kryysler commented 8 years ago

@pelmered Sorry for the radio silence, just got back from my summer vacation :) The index command is intended as a helper to bulk index your models. You need to create an implementation class that could look something like:

<?php

namespace App\Console\Commands;

use Nord\Lumen\Elasticsearch\Console\IndexCommand;

class MyIndexCommand extends IndexCommand
{
    /**
     * @inheritdoc
     */
    protected $signature = 'app:stuff:index';

    /**
     * @inheritdoc
     */
    protected $description = 'Indexes stuff to Elasticsearch.';

    /**
     * @inheritdoc
     */
    public function getData()
    {
        // Array of models from your db to index.
        return [];
    }

    /**
     * @inheritdoc
     */
    public function getIndex()
    {
        return 'my_index';
    }

    /**
     * @inheritdoc
     */
    public function getType()
    {
        return 'my_type';
    }

    /**
     * @inheritdoc
     */
    public function getItemBody($item)
    {
        // $item is one instance of your models returned in "getData" method. The "getItemBody" is called for each model in that list.

        // Build the index document data for this item based on your schema.
        // https://www.elastic.co/guide/en/elasticsearch/guide/current/index-doc.html
        return [];
    }

    /**
     * @inheritdoc
     */
    public function getItemId($item)
    {
        // $item is one instance of your models returned in "getData" method. The "getItemId" is called for each model in that list.

        // Unique ID for the document.
        // https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-id-field.html  
        return 0;
    }

    /**
     * @inheritdoc
     */
    public function getItemParent($item)
    {
        // $item is one instance of your models returned in "getData" method. The "getItemParent" is called for each model in that list.

        // This is only needed if you use parent-child relationships. Otherwise null is what you should return.
        // https://www.elastic.co/guide/en/elasticsearch/guide/current/parent-child.html
        return null;
    }
}

This solution is not perfect, and I have been thinking of improving this but not got around to it yet. If you have some good ideas on how to make it easier to bulk index stuff, please share.

Hope this clears things up! :)

kryysler commented 8 years ago

@pelmered Regarding your suggestion of using event listeners to keep your models up to date in the index, I think it is a good idea. Personally I usually have everything related to creating and updating models wrapped in a service class where I manually invoke the indexing when a model is either created or updated. This gives a bit more flexibility in case you don't always want to re-index the model even if it is updated etc. But this is just personal preference and your suggestion works just as well.

The bulk index command works great as a initial bootstrap for your index when you use event listeners to keep the models in sync in elasticsearch. Then you need not wait for changes happening to each model before you have them in the index. Also, running the bulk index periodically in cron, e.g. once a day, is also a good idea. This will take care of any data inconsistencies if the event listeners failed to index your data for some reason.

pelmered commented 8 years ago

No, problem. I've been away for most of the time since that as well.

Thank you for the help @kryysler!

I the example the getType() method is hard coded and does not get a parameter, am I supposed to create one command for each Model/type in Elasticsearch? Should the getData() method just return collection, like return MyModel::get();? I will see if can come up with some improvements regarding this when after I have played around with it some more.

Regarding the solution for indexing changes I ended upp with a pretty good and clean solution where I created a trait that I can use on the models that I want to index in Elasticsearch. The traits adds some helper methods and adds events for all write operations that is handled by a event listener. If the trait and this handler would be added to this module you would only need to add three events in your EventServiceProvider. I will add the relevant generic code to the documentation. It's probably not perfect and as flexible as some people would need it to be, but I think it's a good start.

With the trait you could also easily defer the getters in the command to the model(that probably should be responsible for this), for example:

public function getItemBody($item)
{
    return $item->getElasticseachItemBody();
}