Closed java8964 closed 9 years ago
Yes, and I'm working on a distributed version of the indexer now. Should be available soon. However, we regularly index ~5TB of data and it does take some time on the order of 1-2 hours. Definitely a bottle neck which will be resolved with a distributed indexer. I'll close this issue when I have that done. If you need it soon let me know and I can prioritize. If you'd like to take a crack at it yourself also let me know. Pull requests always welcome.
Initial cut of the a distributed indexer is available.
This is not a really an issue, but more like a question. It looks like that the Index build part is running within the driver, multithreading. It works for my test data. What I want to know how is the performance on your production about Indexing the data? Is it making sense to use MR jobs to build the index? Our production system has over 10T SSTable files. I kind of worrying running indexing within one driver could be the bottleneck in this case. What is your guys experience?
Thanks