apache / pinot

Apache Pinot - A realtime distributed OLAP datastore
https://pinot.apache.org/
Apache License 2.0
5.39k stars 1.26k forks source link

create real-time segment error #6624

Open tieke1121 opened 3 years ago

tieke1121 commented 3 years ago

error log : INFO [SegmentDictionaryCreator] [wme_pinot12020210301T1252Z] Created dictionary for LONG column: receive_video_minutes with cardinality: 2, range: 0 to 1 2021/03/01 12:53:48.100 INFO [BaseSingleTreeBuilder] [wme_pinot3020210301T1252Z] Starting building star-tree with config: StarTreeV2BuilderConfig[splitOrder=[date_time, webex_site_id, network_type],skipStarNodeCreation=[],functionColumnPairs=[sumbad_video_minutes, sum__receive_bad_video_minutes, sumbad_audio_minutes, sumsend_video_minutes, sumvideo_minutes, sumaudio_minutes, sum__send_bad_video_minutes, sumreceive_video_minutes],maxLeafRecords=10000] 2021/03/01 12:53:48.100 INFO [SegmentDictionaryCreator] [wme_pinot12020210301T1252Z] Using fixed bytes value dictionary for column: server_region, size: 242 2021/03/01 12:53:48.101 INFO [SegmentDictionaryCreator] [wme_pinot12020210301T1252Z] Created dictionary for STRING column: server_region with cardinality: 11, max length in bytes: 22, range: Amsterdam, Netherlands to unknown 2021/03/01 12:53:48.102 INFO [ImmutableSegmentImpl] [wme_pinot3020210301T1252Z] Trying to destroy segment : wme_pinot30__20210301T1252Z 2021/03/01 12:53:48.102 INFO [SegmentDictionaryCreator] [wme_pinot12020210301T1252Z] Created dictionary for LONG column: receive_bad_video_minutes with cardinality: 2, range: 0 to 1 2021/03/01 12:53:48.102 ERROR [LLRealtimeSegmentDataManager_wme_pinot30__20210301T1252Z] [wme_pinot30__20210301T1252Z] Could not build segment java.lang.UnsupportedOperationException: null at org.apache.pinot.core.segment.index.readers.ForwardIndexReader.getDictId(ForwardIndexReader.java:71) ~[pinot-all-0.6.0-jar-with-dependencies.jar:0.6.0-bb646baceafcd9b849a1ecdec7a11203c7027e21] at org.apache.pinot.core.data.readers.PinotSegmentColumnReader.getDictId(PinotSegmentColumnReader.java:54) ~[pinot-all-0.6.0-jar-with-dependencies.jar:0.6.0-bb646baceafcd9b849a1ecdec7a11203c7027e21] at org.apache.pinot.core.startree.v2.builder.BaseSingleTreeBuilder.getSegmentRecordDimensions(BaseSingleTreeBuilder.java:224) ~[pinot-all-0.6.0-jar-with-dependencies.jar:0.6.0-bb646baceafcd9b849a1ecdec7a11203c7027e21] at org.apache.pinot.core.startree.v2.builder.OffHeapSingleTreeBuilder.sortAndAggregateSegmentRecords(OffHeapSingleTreeBuilder.java:213) ~[pinot-all-0.6.0-jar-with-dependencies.jar:0.6.0-bb646baceafcd9b849a1ecdec7a11203c7027e21] at org.apache.pinot.core.startree.v2.builder.BaseSingleTreeBuilder.build(BaseSingleTreeBuilder.java:304) ~[pinot-all-0.6.0-jar-with-dependencies.jar:0.6.0-bb646baceafcd9b849a1ecdec7a11203c7027e21] at org.apache.pinot.core.startree.v2.builder.OffHeapSingleTreeBuilder.build(OffHeapSingleTreeBuilder.java:43) ~[pinot-all-0.6.0-jar-with-dependencies.jar:0.6.0-bb646baceafcd9b849a1ecdec7a11203c7027e21] at org.apache.pinot.core.startree.v2.builder.MultipleTreesBuilder.build(MultipleTreesBuilder.java:142) ~[pinot-all-0.6.0-jar-with-dependencies.jar:0.6.0-bb646baceafcd9b849a1ecdec7a11203c7027e21]

  schema config : 
  {
"schemaName":"xxx",
"dimensionFieldSpecs":[
    {
        "name":"d1",
        "dataType":"STRING",
        "singleValueField":false
    },
    {
        "name":"d2",
        "dataType":"STRING",
        "singleValueField":false
    },
    {
        "name":"d3",
        "dataType":"STRING",
        "singleValueField":false
    },
    {
        "name":"d4",
        "dataType":"STRING",
        "singleValueField":false
    },
    {
        "name":"d5",
        "dataType":"STRING",
        "singleValueField":false
    }
],
"metricFieldSpecs":[
    {
        "name":"m1",
        "dataType":"LONG",
        "defaultNullValue":0
    },
    {
        "name":"m2",
        "dataType":"LONG",
        "defaultNullValue":0
    },
    {
        "name":"m3",
        "dataType":"LONG",
        "defaultNullValue":0
    },
    {
        "name":"m4",
        "dataType":"LONG",
        "defaultNullValue":0
    },
    {
        "name":"m5",
        "dataType":"LONG",
        "defaultNullValue":0
    },
    {
        "name":"m6",
        "dataType":"LONG",
        "defaultNullValue":0
    },
    {
        "name":"m7",
        "dataType":"LONG",
        "defaultNullValue":0
    },
    {
        "name":"m8",
        "dataType":"LONG",
        "defaultNullValue":0
    }
],
"dateTimeFieldSpecs":[
    {
        "name":"date_time",
        "dataType":"STRING",
        "format":"1:DAYS:SIMPLE_DATE_FORMAT:yyyy-MM-dd",
        "granularity":"1:DAYS"
    }
]

}

table config : { "tableName":"tableName", "tableType":"REALTIME", "segmentsConfig":{ "timeColumnName":"date_time", "timeType":"DAYS", "retentionTimeUnit":"DAYS", "retentionTimeValue":"5", "segmentPushType":"APPEND", "segmentAssignmentStrategy":"BalanceNumSegmentAssignmentStrategy", "schemaName":"xxx", "replication":"1", "replicasPerPartition":"1" }, "tenants":{

},
"tableIndexConfig":{
    "loadMode":"MMAP",
    "enableDefaultStarTree":true,
    "streamConfigs":{
        "streamType":"kafka",
        "stream.kafka.consumer.type":"simple",
        "stream.kafka.topic.name":"local_mqa_telemetry_wmequality_report",
        "stream.kafka.decoder.class.name":"org.apache.pinot.plugin.stream.kafka.KafkaJSONMessageDecoder",
        "stream.kafka.consumer.factory.class.name":"org.apache.pinot.plugin.stream.kafka20.KafkaConsumerFactory",
        "stream.kafka.broker.list":"10.250.84.160:9092",
        "realtime.segment.flush.threshold.time":"3600000",
        "realtime.segment.flush.threshold.size":"50000",
        "stream.kafka.consumer.prop.auto.offset.reset":"largest"
    },
    "starTreeIndexConfigs":[
        {
            "dimensionsSplitOrder":[
                "d1",
                "d2",
                "d3",
                "d4",
                "d5"
            ],
            "skipStarNodeCreationForDimensions":[

            ],
            "functionColumnPairs":[
                "SUM__m1",
                "SUM__m2",
                "SUM__m3",
                "SUM__m4",
                "SUM__m5",
                "SUM__m6",
                "SUM__m7",
                "SUM__m8"
            ]
        }
    ]
},
"metadata":{
    "customConfigs":{

    }
}

}

tieke1121 commented 3 years ago

Every time when the data volume reaches 1.5 million, the error occurs

xiangfu0 commented 3 years ago

From the log, seems that the starTree builder is trying to read the column index that doesn't exist, can you double check if the columns in starTree confis: SUM__xxx are contained in your schema with the exact name?

cc: @Jackie-Jiang

tieke1121 commented 3 years ago

I checked my schema config file. use these schema & table configurations. the Pinot servers received 1,500,000 records. then the error occurs. the cluster has 3 servers

Jackie-Jiang commented 3 years ago

@tieke1121 Star-tree index cannot be applied to multi-valued fields. Can you remove the star-tree config and retry?

@icefury71 Let's add this validation to the star-tree config

tieke1121 commented 3 years ago

@tieke1121 Star-tree index cannot be applied to multi-valued fields. Can you remove the star-tree config and retry? if not apply star-tree config. the Pinot server works fine. so is the star-tree index is unusable?

xiangfu0 commented 3 years ago

@tieke1121 Star-tree index cannot be applied to multi-valued fields. Can you remove the star-tree config and retry? if not apply star-tree config. the Pinot server works fine. so is the star-tree index is unusable?

I think startree requires columns d1,d2,...d5 to be single value column, in your schema, those are multi-value column, hence the startree build got exception which prevents the consumption from moving forward.

icefury71 commented 3 years ago

@Jackie-Jiang added a simple validation check: #6641