Open hetao29 opened 6 years ago
It appears that your documents have many fields. In the one example I looked at closely it had over 100 fields. Indexing all of these fields will take additional time. If you don't need to search on all of these fields, it is recommended that you create a custom mapping, and only index the fields you plan to search.
Second, you appear to have many numeric fields. In bleve today numeric fields are very expensive to index, as they are optimized for later doing numeric range searches. But, this optimization means that numeric fields can take up to 16x the space of text field with a single term. This is something we hope to improve in the future, but for now it means you have to be very selective about including numeric fields. Having lots of numeric fields means the index will be quite large (and consequently slow).
Finally, boltdb is the default storage for bleve because it is easy to use and get started. But, it is not the best choice for indexing performance. Choosing one of the alternate key/value stores can offer significantly better indexing performance (usually with some trade-off on search performance).
One more thing, using batches is recommended, but I would be careful about doing the entire workload in a single batch. Typically choosing a batch size of say 100 or 1000 documents works best to efficiently get work done and make incremental progress.
Thanks you very much. I'll try later.
Any docs on the storage types?
In bleve today numeric fields are very expensive to index, as they are optimized for later doing numeric range searches.
Hi @mschoch , thanks for your suggestion. As numeric fields are optimized for later doing numeric range searches, if I would never do range searches on the numeric field, is there a way to skip these optimizations for range searches to improve indexing speed? Thanks very much.
As numeric fields are optimized for later doing numeric range searches, if I would never do range searches on the numeric field, is there a way to skip these optimizations for range searches to improve indexing speed?
How about create your own mapping (instead of using the default one) and define it as a string?
How about create your own mapping (instead of using the default one) and define it as a string?
It seems to be a good idea. I would have a try on it. Thanks very much.
Hi, I test the file in Linux & Mac, and the index speed very slow use Index(docid,doc),and Use batch mode can not finished. The test source and data test.tar.gz: