Closed fjy closed 5 years ago
2015-01-29 14:41:40,334 [myid:] - INFO [supercell-2015-01-29T13:00:00.000Z-persist-n-merge:Logger@70] - Uploading [/var/druid-data/supercell/2015-01- 29T13:00:00.000Z_2015-01-29T14:00:00.000Z/merged] to S3 2015-01-29 14:41:40,334 [myid:] - INFO [supercell-2015-01-29T13:00:00.000Z-persist-n-merge:Logger@70] - Adding file[/var/druid-data/supercell/2015-01 -29T13:00:00.000Z_2015-01-29T14:00:00.000Z/merged/version.bin] with size[4]. Total size so far[0] 2015-01-29 14:41:40,335 [myid:] - INFO [supercell-2015-01-29T13:00:00.000Z-persist-n-merge:Logger@70] - Adding file[/var/druid-data/supercell/2015-01 -29T13:00:00.000Z_2015-01-29T14:00:00.000Z/merged/00000.smoosh] with size[242,823,209]. Total size so far[4] 2015-01-29 14:41:42,950 [myid:] - INFO [HttpPostEmitter-1-0:ChannelResourceFactory@60] - Generating: http://logstash.vigiglo.be:8093 2015-01-29 14:41:47,963 [myid:] - INFO [HttpPostEmitter-1-0:ChannelResourceFactory@60] - Generating: http://logstash.vigiglo.be:8093 2015-01-29 14:41:52,976 [myid:] - INFO [HttpPostEmitter-1-0:ChannelResourceFactory@60] - Generating: http://logstash.vigiglo.be:8093 2015-01-29 14:41:54,860 [myid:] - INFO [supercell-2015-01-29T13:00:00.000Z-persist-n-merge:Logger@70] - Adding file[/var/druid-data/supercell/2015-01 -29T13:00:00.000Z_2015-01-29T14:00:00.000Z/merged/meta.smoosh] with size[13,362]. Total size so far[242,823,213] 2015-01-29 14:41:55,140 [myid:] - INFO [supercell-2015-01-29T13:00:00.000Z-persist-n-merge:Logger@70] - Pushing S3Object [key=supercell/2015-01-29T13 :00:00.000Z_2015-01-29T14:00:00.000Z/2015-01-29T13:00:00.000Z/4/index.zip, bucket=XXXXX, lastModified=null, dataInputStream=null, Metadata={x-amz -acl=bucket-owner-full-control, Content-Length=89411661, Content-MD5=Scs48IZ5ZkhE2khVFaBLFA==, md5-hash=49cb38f08679664844da485515a04b14, Content-Type =application/zip}]. 2015-01-29 14:41:57,989 [myid:] - INFO [HttpPostEmitter-1-0:ChannelResourceFactory@60] - Generating: http://logstash.vigiglo.be:8093 2015-01-29 14:41:58,605 [myid:] - INFO [supercell-2015-01-29T13:00:00.000Z-persist-n-merge:Logger@70] - Pushing S3Object [key=supercell/2015-01-29T13 :00:00.000Z_2015-01-29T14:00:00.000Z/2015-01-29T13:00:00.000Z/4/descriptor.json, bucket=prod.druid, lastModified=null, dataInputStream=null, Metadata= {x-amz-acl=bucket-owner-full-control, Content-Length=7472, Content-MD5=XXXX, md5-hash=XXXX, Content-Ty pe=application/octet-stream}] 2015-01-29 14:41:58,665 [myid:] - INFO [supercell-2015-01-29T13:00:00.000Z-persist-n-merge:Logger@70] - Deleting zipped index File[/tmp/druid62342780 90180139006index.zip] 2015-01-29 14:41:58,684 [myid:] - INFO [supercell-2015-01-29T13:00:00.000Z-persist-n-merge:Logger@70] - Deleting descriptor file[/tmp/druid1834983339 966338370descriptor.json]
0 /var/druid-data/supercell/2015-01-28T19:00:00.000Z_2015-01-28T20:00:00.000Z 0 /var/druid-data/supercell/2015-01-29T08:00:00.000Z_2015-01-29T09:00:00.000Z 483M /var/druid-data/supercell/2015-01-29T13:00:00.000Z_2015-01-29T14:00:00.000Z 497M /var/druid-data/supercell/2015-01-29T15:00:00.000Z_2015-01-29T16:00:00.000Z 2,3G /var/druid-data/supercell/2015-01-29T17:00:00.000Z_2015-01-29T18:00:00.000Z 876M /var/druid-data/supercell/2015-01-29T18:00:00.000Z_2015-01-29T19:00:00.000Z [...]
I removed the empty / corrupted segment directories and relaunched the realtime node:
2015-02-05 12:22:53,056 [myid:] - INFO [chief-supercell:Logger@70] - Loading previously persisted segment at [/var/druid-data/supercell/2015-01-29T13:00:00.000Z_2015-01-29T14:00:00.000Z/0] 2015-02-05 12:22:53,285 [myid:] - INFO [chief-supercell:Logger@70] - Loading previously persisted segment at [/var/druid-data/supercell/2015-01-29T13:00:00.000Z_2015-01-29T14:00:00.000Z/1] 2015-02-05 12:22:53,507 [myid:] - INFO [chief-supercell:Logger@70] - Loading previously persisted segment at [/var/druid-data/supercell/2015-01-29T13:00:00.000Z_2015-01-29T14:00:00.000Z/2] 2015-02-05 12:22:53,746 [myid:] - INFO [chief-supercell:Logger@70] - Loading previously persisted segment at [/var/druid-data/supercell/2015-01-29T13:00:00.000Z_2015-01-29T14:00:00.000Z/3] 2015-02-05 12:22:53,786 [myid:] - INFO [ServerInventoryView-0:Logger@70] - CHILD_ADDED[supercell_2015-01-29T13:00:00.000Z_2015-01-29T14:00:00.000Z_2015-01-29T13:00:00.000Z_4] with version[0] 2015-02-05 12:22:54,006 [myid:] - INFO [chief-supercell:Logger@70] - Loading previously persisted segment at [/var/druid-data/supercell/2015-01-29T13:00:00.000Z_2015-01-29T14:00:00.000Z/4] 2015-02-05 12:22:54,059 [myid:] - INFO [ServerInventoryView-0:Logger@70] - CHILD_ADDED[supercell_2015-01-29T13:00:00.000Z_2015-01-29T14:00:00.000Z_2015-01-29T13:00:00.000Z_3] with version[0] 2015-02-05 12:22:54,136 [myid:] - INFO [ServerInventoryView-0:Logger@70] - CHILD_ADDED[supercell_2015-01-29T13:00:00.000Z_2015-01-29T14:00:00.000Z_2015-01-29T13:00:00.000Z_3] with version[0] 2015-02-05 12:22:54,137 [myid:] - INFO [ServerInventoryView-0:Logger@70] - CHILD_ADDED[supercell_2015-01-29T13:00:00.000Z_2015-01-29T14:00:00.000Z_2015-01-29T13:00:00.000Z_4] with version[0] 2015-02-05 12:22:54,190 [myid:] - INFO [chief-supercell:Logger@70] - Loading previously persisted segment at [/var/druid-data/supercell/2015-01-29T13:00:00.000Z_2015-01-29T14:00:00.000Z/5] 2015-02-05 12:22:54,366 [myid:] - INFO [chief-supercell:Logger@70] - Loading previously persisted segment at [/var/druid-data/supercell/2015-01-29T13:00:00.000Z_2015-01-29T14:00:00.000Z/6] 2015-02-05 12:22:54,483 [myid:] - INFO [chief-supercell:Logger@70] - Loading previously persisted segment at [/var/druid-data/supercell/2015-01-29T13:00:00.000Z_2015-01-29T14:00:00.000Z/7] 2015-02-05 12:22:54,643 [myid:] - INFO [chief-supercell:Logger@70] - Loading previously persisted segment at [/var/druid-data/supercell/2015-01-29T13:00:00.000Z_2015-01-29T14:00:00.000Z/8] 2015-02-05 12:22:54,828 [myid:] - INFO [chief-supercell:Logger@70] - Loading previously persisted segment at [/var/druid-data/supercell/2015-01-29T13:00:00.000Z_2015-01-29T14:00:00.000Z/9] 2015-02-05 12:22:55,009 [myid:] - INFO [chief-supercell:Logger@70] - Loading previously persisted segment at [/var/druid-data/supercell/2015-01-29T13:00:00.000Z_2015-01-29T14:00:00.000Z/10] 2015-02-05 12:22:55,193 [myid:] - INFO [chief-supercell:Logger@70] - Loading previously persisted segment at [/var/druid-data/supercell/2015-01-29T13:00:00.000Z_2015-01-29T14:00:00.000Z/11] 2015-02-05 12:22:55,402 [myid:] - INFO [chief-supercell:Logger@70] - Loading previously persisted segment at [/var/druid-data/supercell/2015-01-29T13:00:00.000Z_2015-01-29T14:00:00.000Z/12] 2015-02-05 12:22:55,660 [myid:] - INFO [chief-supercell:Logger@70] - Announcing segment[supercell_2015-01-29T13:00:00.000Z_2015-01-29T14:00:00.000Z_2015-01-29T13:00:00.000Z_4] to path[/druid/servedSegments/druid4.vigi:8083/supercell_2015-01-29T13:00:00.000Z_2015-01-29T14:00:00.000Z_2015-01-29T13:00:00.000Z_4] 2015-02-05 12:22:55,674 [myid:] - INFO [chief-supercell:Logger@70] - Announcing segment[supercell_2015-01-29T13:00:00.000Z_2015-01-29T14:00:00.000Z_2015-01-29T13:00:00.000Z_4] at path[/druid/segments/druid4.vigi:8083/2015-02-05T12:22:55.673Z0] 2015-02-05 12:22:55,678 [myid:] - INFO [ServerInventoryView-0:Logger@70] - CHILD_ADDED[supercell_2015-01-29T13:00:00.000Z_2015-01-29T14:00:00.000Z_2015-01-29T13:00:00.000Z_4] with version[0]
No cleanup of directory...
$ ./s3cmd ls s3://prod.druid/supercell/ | grep 2015-01-29T13:00:00.000Z_2015-01-29T14:00:00.000Z DIR s3://prod.druid/supercell/2015-01-29T13:00:00.000Z_2015-01-29T14:00:00.000Z/ $ ./s3cmd du s3://prod.druid/supercell/2015-01-29T13:00:00.000Z_2015-01-29T14:00:00.000Z/ 197519582 s3://prod.druid/supercell/2015-01-29T13:00:00.000Z_2015-01-29T14:00:00.000Z/
ansible -i ./prod.hosts -m shell historical -a "find /var/druid-historical/ -name '2015-01-29T13*'" druid5 | success | rc=0 >> /var/druid-historical/indexCache/supercell/2015-01-29T13:00:00.000Z_2015-01-29T14:00:00.000Z /var/druid-historical/indexCache/supercell/2015-01-29T13:00:00.000Z_2015-01-29T14:00:00.000Z/2015-01-29T13:00:00.000Z
druid6 | success | rc=0 >>
druid8 | success | rc=0 >> /var/druid-historical/indexCache/supercell/2015-01-29T13:00:00.000Z_2015-01-29T14:00:00.000Z /var/druid-historical/indexCache/supercell/2015-01-29T13:00:00.000Z_2015-01-29T14:00:00.000Z/2015-01-29T13:00:00.000Z
druid7 | success | rc=0 >> /var/druid-historical/indexCache/supercell/2015-01-29T13:00:00.000Z_2015-01-29T14:00:00.000Z /var/druid-historical/indexCache/supercell/2015-01-29T13:00:00.000Z_2015-01-29T14:00:00.000Z/2015-01-29T13:00:00.000Z
@T0aD
Can you look in the coordinator logs to see if a segment that was not removed was announced there?
I suspect it is given it shows up in our coordinator console, which leads me to believe there is something wrong with your realtime setup.
On cortex1: (master coordinator)
root@cortex1:/proc/31578/cwd# grep 2015-01-29T13:00:00.000Z_2015-01-29T14 ./stdout | awk -F' - ' '{print $3}' | grep ^Server Server[druid8.vigi:8081] added segment[supercell_2015-01-29T13:00:00.000Z_2015-01-29T14:00:00.000Z_2015-01-29T13:00:00.000Z_3] Server[druid8.vigi:8081] added segment[supercell_2015-01-29T13:00:00.000Z_2015-01-29T14:00:00.000Z_2015-01-29T13:00:00.000Z_4] Server[druid4.vigi:8083] added segment[supercell_2015-01-29T13:00:00.000Z_2015-01-29T14:00:00.000Z_2015-01-29T13:00:00.000Z_4] Server[druid5.vigi:8081] added segment[supercell_2015-01-29T13:00:00.000Z_2015-01-29T14:00:00.000Z_2015-01-29T13:00:00.000Z_3] Server[druid5.vigi:8081] added segment[supercell_2015-01-29T13:00:00.000Z_2015-01-29T14:00:00.000Z_2015-01-29T13:00:00. 000Z_4]
On druid4 (realtime node):
/var/druid-data/supercell/2015-01-29T13:00:00.000Z_2015-01-29T14:00:00.000Z/isPushedMarker
So it seems it was pushed but not cleaned up, and already stored in 2 historical nodes, I probably should just remove those segments .. ?
Since I don't have much time to investigate on the issue, I wrote a little helper script to move already pushed segments, please tell me if it sounds ok to you, thanks.
Closing as the realtime node was removed
T0aD in IRC has reported a strange issue with segments not being handed off. I believe the following may have occurred:
1) A segment was successfully handed off but something went wrong during the cleanup process (we don't have the logs, but it appears the directory was not cleaned up) 2) The realtime node was restarted. The node found the directory of the segment exists, and announced it was serving that segment. Subsequently, the segment was not removed.