go-ego / riot

Go Open Source, Distributed, Simple and efficient Search Engine; Warning: This is V1 and beta version, because of big memory consume, and the V2 will be rewrite all code.
Apache License 2.0
6.11k stars 473 forks source link

创建持久化索引,每分钟只能写3000-5000条数据,请问正常吗?Create a persistent index, only write 3000-5000 data per minute. Is it normal? #48

Closed duomi closed 6 years ago

duomi commented 6 years ago

使用的是微博搜索那个例子,将数据改为从自己数据库中搜索,但是写入的速度很慢,每分钟只能写入3000-5000条,已经使用了协程,不知道是不是哪里写错了。代码如下 Using the Weibo search example, the data is changed to search from my own database, but the write speed is very slow, only 3000-5000 can be written per minute. Correspondence has been used,I don't know where it was wrong.Code show as below

for i := 0; i < 100; i++ {
    go indexXwz(xwzs)
}

func indexXwz(xwzs <-chan Xwz) {
    for xwz := range xwzs {
        searcher.IndexDoc(xwz.Id, types.DocIndexData{
            Content: xwz.Name,
            Fields: XwzScoringFields{
                Timestamp: xwz.LatestDate,
                CountNum:  xwz.CountNum,
            },
        }, true)
    }
        searcher.Flush()
}
vcaesar commented 6 years ago

First, searcher.Flush() only needs to be called once, and then you

searcher.Init(types.EngineOpts{
    // Using: using,
    StorageShards: storageShards,
    NumShards: numShards,
})

configure Coroutines.

duomi commented 6 years ago

@vcaesar Do you mean that my coroutines are not running? Can you describe more clearly?

vcaesar commented 6 years ago

I mean is that you can configure the number of coroutines for storage to increase speed.

duomi commented 6 years ago

I has already use loop to run 100 coroutines, did i use it in a wrong way? So what's the correct way,can you show me,please?@vcaesar

karfield commented 6 years ago

@Cliff2016 Using internal sharding instead of fork routines with calling IndexDoc, it's not called "parallel processing". 并发的调用一个接口并不等于让内部分片产生效果,顶多是频繁调用接口,而且跟糟糕的是调用完了还去 flush 一下. You need to think like a program. 一个引擎 indexing 真正工作慢的原因往往在于io,所以没事不要去flush,这个引擎内部有分片机制,那就多用用这个机制,来提升效率。