Open rohit-mobstac opened 1 year ago
Another problem that I have noticed is the segments are not getting committed to deep store (s3) even after realtime.segment.flush.threshold.time
set to 1h
Looks like it is not able to upload segments. Are the server and controller running on the same host? Segment upload link says "localhost:9000". Hence checking
no the server and controller are in different EC2 instances. @navina Attaching the server and controller config thats used Server:
# Pinot Role
pinot.service.role=SERVER
# Pinot Cluster name
pinot.cluster.name=cluster name
# Pinot Zookeeper Server
pinot.zk.server=zk1:2181,zk2:2182,zk3:2183
# Use hostname as Pinot Instance ID other than IP
pinot.set.instance.id.to.hostname=true
# Pinot Server Netty Port for queris
pinot.server.netty.port=8098
# Pinot Server Admin API port
pinot.server.adminapi.port=8097
# Pinot Server Data Directory
pinot.server.instance.dataDir=/tmp/pinot/data/server/index
# Pinot Server Temporary Segment Tar Directory
pinot.server.instance.segmentTarDir=/tmp/pinot/data/server/segmentTar
pinot.server.storage.factory.class.s3=org.apache.pinot.plugin.filesystem.S3PinotFS
pinot.server.storage.factory.s3.region=us-east-1
pinot.server.segment.fetcher.protocols=file,http,s3
pinot.server.segment.fetcher.s3.class=org.apache.pinot.common.utils.fetcher.PinotFSSegmentFetcher
Controller
# Pinot Role
pinot.service.role=CONTROLLER
# Pinot Cluster name
pinot.cluster.name=cluster name
# Pinot Zookeeper Server
pinot.zk.server=zk1:2181,zk2:2182,zk3:2183
# Use hostname as Pinot Instance ID other than IP
pinot.set.instance.id.to.hostname=true
# Pinot Controller Port
controller.port=9000
# Pinot Controller VIP Host
controller.vip.host=localhost
# Pinot Controller VIP Port
controller.vip.port=9000
# Location to store Pinot Segments pushed from clients
controller.data.dir=s3://mybucket/controllerData/
controller.local.temp.dir=/tmp/pinot-tmp-data/
pinot.controller.storage.factory.class.s3=org.apache.pinot.plugin.filesystem.S3PinotFS
pinot.controller.storage.factory.s3.region=us-east-1
pinot.controller.segment.fetcher.protocols=file,http,s3
pinot.controller.segment.fetcher.s3.class=org.apache.pinot.common.utils.fetcher.PinotFSSegmentFetcher
Is there any news on this issue, I am facing the same error. It seems the server is trying to commit the segment to localhost instead of using the controller running in a different ec2 instance
Were you able to fix this?
@FranMorilloAWS
I guess controller.vip.host=localhost
is the cause of this. Can you check if you have similar setting in your config?
@Jackie-Jiang Yes i modified the controller.vip.host to point to the loadbalancer of the controllers and it worked. However I am facing now an issue with that with the current Table Configuration, once the segments go from consuming to Good, the servers are not creating new segments to continue consuming from the kinesis data stream.
As well as ignoring the number of rows, or size for the segments.
Ill add my table configuration:
{ "REALTIME": { "tableName": "kinesisTable_REALTIME", "tableType": "REALTIME", "segmentsConfig": { "replication": "2", "retentionTimeUnit": "DAYS", "retentionTimeValue": "7", "replicasPerPartition": "2", "minimizeDataMovement": false, "timeColumnName": "creationTimestamp", "segmentPushType": "APPEND", "completionConfig": { "completionMode": "DOWNLOAD" } }, "tenants": { "broker": "DefaultTenant", "server": "DefaultTenant" }, "tableIndexConfig": { "invertedIndexColumns": [ "product" ], "noDictionaryColumns": [ "price" ], "rangeIndexVersion": 2, "autoGeneratedInvertedIndex": false, "createInvertedIndexDuringSegmentGeneration": false, "sortedColumn": [ "creationTimestamp" ], "loadMode": "MMAP", "streamConfigs": { "streamType": "kinesis", "stream.kinesis.topic.name": "pinot-stream", "region": "eu-west-1", "shardIteratorType": "LATEST", "stream.kinesis.consumer.type": "lowlevel", "stream.kinesis.fetch.timeout.millis": "30000", "stream.kinesis.decoder.class.name": "org.apache.pinot.plugin.stream.kafka.KafkaJSONMessageDecoder", "stream.kinesis.consumer.factory.class.name": "org.apache.pinot.plugin.stream.kinesis.KinesisConsumerFactory", "realtime.segment.flush.threshold.rows": "1400000", "realtime.segment.flush.threshold.time": "1h", "realtime.segment.flush.threshold.size": "200M" }, "varLengthDictionaryColumns": [ "campaign", "color", "department" ], "enableDefaultStarTree": false, "enableDynamicStarTreeCreation": false, "aggregateMetrics": false, "nullHandlingEnabled": false, "optimizeDictionary": false, "optimizeDictionaryForMetrics": false, "noDictionarySizeRatioThreshold": 0 }, "metadata": { "customConfigs": {} }, "isDimTable": false } }
The table has four segments that are in Good State and are in S3. It just stops creating consuming segments and stops reading from kinesis
Please check the controller log for exceptions. The new consuming segment should be created by controller. @KKcorps Can you help with this issue?
What I noticed was that when setting this value: "realtime.segment.flush.threshold.rows": "1400000" as a String, it ignores the size and row number for when completing the segments, and it creates new one only after the threshold time occurs. By setting it not as a string but integer, It does create the new segments. It doesnt reach the suggested size as no segment is above 3 megabytes .
You may refer to this doc on how to configure them: https://docs.pinot.apache.org/basics/data-import/pinot-stream-ingestion
Threshold time is honored when not enough rows are collected. The value should always be string, and if you want to use size threshold, "realtime.segment.flush.threshold.rows": "0"
should be used
Hi! I am running tests in Pinot consuming from Kinesis Data Streams. I am running two r5x.large servers. I noticed that once it reached 8 segments and it wont create any new segments, until it reaches the flush time. This is my current configuration: "realtime.segment.flush.threshold.rows": "0", "realtime.segment.flush.threshold.time": "1h", "realtime.segment.flush.threshold.size": "200M"
When i go into each segment i see that the tresholdrows is defined to be and it doesnt reach the size. (each segment is 1.7 MB in S3) "segment.flush.threshold.size": "150000",
Hi @FranMorilloAWS , are you using on-demand or provisioned mode? Have you checked the kinesis stream metadata? This might be due the stream got closed and new shard got created?
Using On Demand. I believed that by using the low consumer, Pinot would handle the closing and creation of new shards, either on demand or provisioned when scaled. Is that not the case?
@FranMorilloAWS I wonder if setting completion mode to DOWNLOAD
is causing this issue.
any particular reason why you are using this config?
"completionConfig": {
"completionMode": "DOWNLOAD"
}
Because at some point it was not downloading the segments to s3, i will eliminate the completion config and try again. Thanks Ill update here
Hi, I am now facing the same issue but not even seeing the segments in S3.
Pinot realtime ingestion from kinesis data stream works as expected for sometime but eventually stops consuming. While checking server logs,
connection refused
error is shown. Is this due to the issue mentioned in this PR #9863.