创建持久化索引，每分钟只能写3000-5000条数据，请问正常吗？Create a persistent index, only write 3000-5000 data per minute. Is it normal?

go-ego / riot

Go Open Source, Distributed, Simple and efficient Search Engine; Warning: This is V1 and beta version, because of big memory consume, and the V2 will be rewrite all code.

Apache License 2.0

6.11k stars 473 forks source link

创建持久化索引，每分钟只能写3000-5000条数据，请问正常吗？Create a persistent index, only write 3000-5000 data per minute. Is it normal? #48

Closed duomi closed 6 years ago

duomi commented 6 years ago

使用的是微博搜索那个例子，将数据改为从自己数据库中搜索，但是写入的速度很慢，每分钟只能写入3000-5000条，已经使用了协程，不知道是不是哪里写错了。代码如下 Using the Weibo search example, the data is changed to search from my own database, but the write speed is very slow, only 3000-5000 can be written per minute. Correspondence has been used,I don't know where it was wrong.Code show as below

for i := 0; i < 100; i++ {
    go indexXwz(xwzs)
}

func indexXwz(xwzs <-chan Xwz) {
    for xwz := range xwzs {
        searcher.IndexDoc(xwz.Id, types.DocIndexData{
            Content: xwz.Name,
            Fields: XwzScoringFields{
                Timestamp: xwz.LatestDate,
                CountNum:  xwz.CountNum,
            },
        }, true)
    }
        searcher.Flush()
}

vcaesar commented 6 years ago

First, searcher.Flush() only needs to be called once, and then you

searcher.Init(types.EngineOpts{
    // Using: using,
    StorageShards: storageShards,
    NumShards: numShards,
})

configure Coroutines.

duomi commented 6 years ago

@vcaesar Do you mean that my coroutines are not running? Can you describe more clearly?

vcaesar commented 6 years ago

I mean is that you can configure the number of coroutines for storage to increase speed.

duomi commented 6 years ago

I has already use loop to run 100 coroutines, did i use it in a wrong way? So what's the correct way,can you show me,please?@vcaesar

karfield commented 6 years ago

@Cliff2016 Using internal sharding instead of fork routines with calling IndexDoc, it's not called "parallel processing". 并发的调用一个接口并不等于让内部分片产生效果，顶多是频繁调用接口，而且跟糟糕的是调用完了还去 flush 一下. You need to think like a program. 一个引擎 indexing 真正工作慢的原因往往在于io，所以没事不要去flush，这个引擎内部有分片机制，那就多用用这个机制，来提升效率。