Closed muyannian closed 10 years ago
如果有些列的值是null列,也要一下处理,标记为null列
在一次遍历中,如果找到了全部匹配的doclist,那么可以立即退出。而不是要继续循环下去。
2014-04-30 12:05:33 RealTimeDirectory [ERROR] deleteDirector java.io.IOException: Cannot delete /disk7/taobao/mdrill/higo/tanx_click/20140429/0/realtime/20140429220727_160_3772555332/_5_new at org.apache.lucene.store.FSDirectory.deleteFile(FSDirectory.java:376) at org.apache.lucene.store.LinkFSDirectory.deleteFile(LinkFSDirectory.java:319) at com.alimama.mdrill.solr.realtime.realtime.RealTimeDirectorUtils.deleteDirector(RealTimeDirectorUtils.java:68) at com.alimama.mdrill.solr.realtime.RealTimeDirectory.expire(RealTimeDirectory.java:193) at com.alimama.mdrill.solr.realtime.RealTimeDirectory.expire(RealTimeDirectory.java:45) at com.alimama.mdrill.adhoc.TimeCacheMap.clean(TimeCacheMap.java:139) at com.alimama.mdrill.adhoc.TimeCacheMap.maybeClean(TimeCacheMap.java:123) at com.alimama.mdrill.adhoc.TimeCacheMap$2.run(TimeCacheMap.java:83) at java.util.TimerThread.mainLoop(Timer.java:512) at java.util.TimerThread.run(Timer.java:462) 2014-04-30 12:05:33 SolrCore [INFO] getSearcher:fact_wirel
实时部分不稳定,特点 1.首次查询,看不到数据 2.有的时候查询数据有波动
该改进上线了有一段时间了,但发现对性能的改进并不明显,相反性能还变差了。 总结了下原因,因为要实现跳跃功能,有太多的额外信息要保留(比如说偏移量),而且因为涉及随机读,以及跳跃表放在了doclist的后面,会导致并不是单向的随即读,故性能不核算
新的解决办法,单独一个文件,里面定长的存储每个docid对应的termNum就可以了。不保留原始值,而且termNum根据存储重复程度,可以是byte,short,int采用变长类型存储
报错
2014-06-15 17:10:17 ShardGroupByTermNumCompare [INFO] ####SortType [sortFieldNum=-1, typeNum=0, typeEnum=index] 2014-06-15 17:10:17 MdrillParseGroupby [INFO] ##baseDocs.size## 543844@543844 2014-06-15 17:10:17 FacetComponent [ERROR] getFacetCounts java.lang.ArrayIndexOutOfBoundsException: 65535 at org.apache.solr.request.BigReUsedBuffer$BlockArray.get(BigReUsedBuffer.java:412) at org.apache.solr.request.uninverted.RamTermNumValue$termDoubleValue_indexl.doc(RamTermNumValue.java:165) at org.apache.solr.request.uninverted.RamDocValue$TermNumReadSingleNotNull.quickToDouble(RamDocValue.java:227) at org.apache.solr.request.uninverted.RamDocValue$TermNumReadSingle.quickToDouble(RamDocValue.java:299) at org.apache.solr.request.uninverted.UnInvertedFieldBase.quickToDouble(UnInvertedFieldBase.java:165) at org.apache.solr.request.mdrill.MdrillParseGroupby$fetchContaioner.updateStat(MdrillParseGroupby.java:243) at org.apache.solr.request.mdrill.MdrillGroupBy.makeTopGroups(MdrillGroupBy.java:105) at org.apache.solr.request.mdrill.MdrillGroupBy.get(MdrillGroupBy.java:71) at org.apache.lucene.index.SegmentReader.invertScan(SegmentReader.java:520) at org.apache.lucene.index.DirectoryReader.invertScan(DirectoryReader.java:591) at org.apache.lucene.index.FilterIndexReader.invertScan(FilterIndexReader.java:320) at org.apache.solr.request.mdrill.FacetComponent.getResult(FacetComponent.java:165) at org.apache.solr.request.mdrill.FacetComponent.process(FacetComponent.java:86) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:101) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1506)
异常特征 按照某些类别筛选 比如说筛选 辽宁省的,但是筛选的结果 存在各种省的都有。
已经有新的实现发布了,
1.我的问题描述如下
2.讨论后得到的思路如下
2.lucene跳跃表基础请参考这里 http://blog.csdn.net/forfuture1978/article/details/4976794