Closed sboettcher closed 7 years ago
I had another look into the HDFS file system on the server and I am pretty sure that the missing data is actually in the HDFS. With hdfs dfs -ls
I found .avro files in two partitions that were written in that night, and I just picked one of the files and looked into it (hdfs dfs -cat
) and I can see the patient ID from that specific recording here data is missing (the actual data is binary obv.).
Is there a way to restart the extraction from scratch for just one topic/ID? Maybe by manipulating the bins.csv
and/or offsets.csv
?
(I suppose just deleting the whole topics
folder would work too but then everything has to be extracted again.)
Also I did not really find any suspicious log entries from that time in the docker-compose logs for hdfs-datanode-1
hdfs-datanode-2
hdfs-namenode
radar-hdfs-connector
.
To restart the extraction for a portion of the data, you can indeed manipulate offsets.csv. Just remove all lines of the acceleration topic (or just the offsets that gave an exception) and run the script again.
Could you try to convert one of the files that gave a BlockMissingException to json with AvroTools?
hadoop fs -get /topicAndroidNew/android_empatica_e4_blood_volume_pulse/partition=2/android_empatica_e4_blood_volume_pulse+2+0005159476+0005159625.avro
java -jar /usr/share/avro-tools/avro-tools-1.7.7.jar tojson android_empatica_e4_blood_volume_pulse+2+0005159476+0005159625.avro
tojson
works without error message on that file.
Will now try to just rerun the extraction on the topic.
Edit: I think this is a separate issue, submitted it here: https://github.com/RADAR-CNS/RADAR-HDFS-Sink-Connector/issues/4
Hm so I saw that radar-hdfs-connector
had now stopped with an exception:
radar-hdfs-connector_1 | 2017-08-05T12:54:01.133005441Z [2017-08-05 12:54:01,132] ERROR Task radar-hdfs-sink-android-15000-0 threw an uncaught and unrecoverable exception (org.apache.kafka.connect.runtime.WorkerTask)
radar-hdfs-connector_1 | 2017-08-05T12:54:01.133085075Z java.lang.NullPointerException
radar-hdfs-connector_1 | 2017-08-05T12:54:01.133092941Z at io.confluent.connect.hdfs.DataWriter.close(DataWriter.java:296)
radar-hdfs-connector_1 | 2017-08-05T12:54:01.133100828Z at io.confluent.connect.hdfs.HdfsSinkTask.close(HdfsSinkTask.java:121)
radar-hdfs-connector_1 | 2017-08-05T12:54:01.133108550Z at org.apache.kafka.connect.runtime.WorkerSinkTask.commitOffsets(WorkerSinkTask.java:302)
radar-hdfs-connector_1 | 2017-08-05T12:54:01.133115345Z at org.apache.kafka.connect.runtime.WorkerSinkTask.closePartitions(WorkerSinkTask.java:435)
radar-hdfs-connector_1 | 2017-08-05T12:54:01.133121594Z at org.apache.kafka.connect.runtime.WorkerSinkTask.execute(WorkerSinkTask.java:147)
radar-hdfs-connector_1 | 2017-08-05T12:54:01.133129616Z at org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:140)
radar-hdfs-connector_1 | 2017-08-05T12:54:01.133137065Z at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:175)
radar-hdfs-connector_1 | 2017-08-05T12:54:01.133143866Z at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
radar-hdfs-connector_1 | 2017-08-05T12:54:01.133155380Z at java.util.concurrent.FutureTask.run(FutureTask.java:266)
radar-hdfs-connector_1 | 2017-08-05T12:54:01.133161875Z at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
radar-hdfs-connector_1 | 2017-08-05T12:54:01.133169393Z at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
radar-hdfs-connector_1 | 2017-08-05T12:54:01.133177736Z at java.lang.Thread.run(Thread.java:745)
radar-hdfs-connector_1 | 2017-08-05T12:54:01.133184102Z [2017-08-05 12:54:01,132] ERROR Task is being killed and will not recover until manually restarted (org.apache.kafka.connect.runtime.WorkerTask)
so I restarted the container, and got a bunch of exceptions ending again with that one:
radar-hdfs-connector_1 | 2017-08-08T10:58:17.658855180Z Exception in thread "kafka-coordinator-heartbeat-thread | connect-radar-hdfs-sink-android-15000" Exception in thread "LeaseRenewer:root@hdfs-namenode:8020" Exception in thread "kafka-coordinator-heartbeat-thread | connect-radar-hdfs-sink-android-15000" java.lang.OutOfMemoryError: Java heap space
radar-hdfs-connector_1 | 2017-08-08T10:58:17.660211421Z at java.util.HashMap$EntrySet.iterator(HashMap.java:1013)
radar-hdfs-connector_1 | 2017-08-08T10:58:17.660228131Z at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.trySend(ConsumerNetworkClient.java:386)
radar-hdfs-connector_1 | 2017-08-08T10:58:17.660237971Z at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:223)
radar-hdfs-connector_1 | 2017-08-08T10:58:17.660330193Z at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.pollNoWakeup(ConsumerNetworkClient.java:266)
radar-hdfs-connector_1 | 2017-08-08T10:58:17.660342490Z at org.apache.kafka.clients.consumer.internals.AbstractCoordinator$HeartbeatThread.run(AbstractCoordinator.java:865)
radar-hdfs-connector_1 | 2017-08-08T10:58:17.661818388Z java.lang.OutOfMemoryError: Java heap space
radar-hdfs-connector_1 | 2017-08-08T10:58:17.661917082Z [2017-08-08 10:58:17,659] ERROR Task radar-hdfs-sink-android-15000-3 threw an uncaught and unrecoverable exception (org.apache.kafka.connect.runtime.WorkerTask)
radar-hdfs-connector_1 | 2017-08-08T10:58:17.661930189Z java.lang.OutOfMemoryError: Java heap space
radar-hdfs-connector_1 | 2017-08-08T10:58:17.662053284Z at java.util.ArrayList.iterator(ArrayList.java:834)
radar-hdfs-connector_1 | 2017-08-08T10:58:17.662067477Z at org.apache.hadoop.hdfs.LeaseRenewer.clientsRunning(LeaseRenewer.java:241)
radar-hdfs-connector_1 | 2017-08-08T10:58:17.662076405Z at org.apache.hadoop.hdfs.LeaseRenewer.access$500(LeaseRenewer.java:71)
radar-hdfs-connector_1 | 2017-08-08T10:58:17.662530537Z [2017-08-08 10:58:17,660] ERROR Task radar-hdfs-sink-android-15000-2 threw an uncaught and unrecoverable exception (org.apache.kafka.connect.runtime.WorkerTask)
radar-hdfs-connector_1 | 2017-08-08T10:58:17.662543689Z java.lang.NullPointerException
radar-hdfs-connector_1 | 2017-08-08T10:58:17.662551642Z at io.confluent.connect.hdfs.DataWriter.close(DataWriter.java:296)
radar-hdfs-connector_1 | 2017-08-08T10:58:17.662558950Z at io.confluent.connect.hdfs.HdfsSinkTask.close(HdfsSinkTask.java:121)
radar-hdfs-connector_1 | 2017-08-08T10:58:17.662566144Z at org.apache.kafka.connect.runtime.WorkerSinkTask.commitOffsets(WorkerSinkTask.java:302)
radar-hdfs-connector_1 | 2017-08-08T10:58:17.662573083Z at org.apache.kafka.connect.runtime.WorkerSinkTask.closePartitions(WorkerSinkTask.java:435)
radar-hdfs-connector_1 | 2017-08-08T10:58:17.662580007Z at org.apache.kafka.connect.runtime.WorkerSinkTask.execute(WorkerSinkTask.java:147)
radar-hdfs-connector_1 | 2017-08-08T10:58:17.662587186Z at org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:140)
radar-hdfs-connector_1 | 2017-08-08T10:58:17.662593930Z at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:175)
radar-hdfs-connector_1 | 2017-08-08T10:58:17.662600773Z at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
radar-hdfs-connector_1 | 2017-08-08T10:58:17.662607643Z at java.util.concurrent.FutureTask.run(FutureTask.java:266)
radar-hdfs-connector_1 | 2017-08-08T10:58:17.662614597Z at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
radar-hdfs-connector_1 | 2017-08-08T10:58:17.662622157Z at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
radar-hdfs-connector_1 | 2017-08-08T10:58:17.662640133Z at java.lang.Thread.run(Thread.java:745)
radar-hdfs-connector_1 | 2017-08-08T10:58:17.663100448Z [2017-08-08 10:58:17,660] WARN Unexpected error reading responses on connection Thread[IPC Client (248329120) connection to hdfs-namenode/172.18.0.3:8020 from root,5,main] (org.apache.hadoop.ipc.Client)
radar-hdfs-connector_1 | 2017-08-08T10:58:17.663114053Z java.lang.OutOfMemoryError: Java heap space
radar-hdfs-connector_1 | 2017-08-08T10:58:17.663122000Z at java.lang.Integer.valueOf(Integer.java:832)
radar-hdfs-connector_1 | 2017-08-08T10:58:17.663129185Z at sun.nio.ch.EPollSelectorImpl.updateSelectedKeys(EPollSelectorImpl.java:120)
radar-hdfs-connector_1 | 2017-08-08T10:58:17.663136544Z at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:98)
radar-hdfs-connector_1 | 2017-08-08T10:58:17.663143873Z at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86)
radar-hdfs-connector_1 | 2017-08-08T10:58:17.663151310Z at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97)
radar-hdfs-connector_1 | 2017-08-08T10:58:17.663158981Z at org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool.select(SocketIOWithTimeout.java:335)
radar-hdfs-connector_1 | 2017-08-08T10:58:17.663166127Z at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:157)
radar-hdfs-connector_1 | 2017-08-08T10:58:17.663173755Z at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161)
radar-hdfs-connector_1 | 2017-08-08T10:58:17.663181039Z at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:131)
radar-hdfs-connector_1 | 2017-08-08T10:58:17.663188887Z at java.io.FilterInputStream.read(FilterInputStream.java:133)
radar-hdfs-connector_1 | 2017-08-08T10:58:17.663196501Z at java.io.FilterInputStream.read(FilterInputStream.java:133)
radar-hdfs-connector_1 | 2017-08-08T10:58:17.663203539Z at org.apache.hadoop.ipc.Client$Connection$PingInputStream.read(Client.java:514)
radar-hdfs-connector_1 | 2017-08-08T10:58:17.663210703Z at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
radar-hdfs-connector_1 | 2017-08-08T10:58:17.663217825Z at java.io.BufferedInputStream.read(BufferedInputStream.java:265)
radar-hdfs-connector_1 | 2017-08-08T10:58:17.663224860Z at java.io.DataInputStream.readInt(DataInputStream.java:387)
radar-hdfs-connector_1 | 2017-08-08T10:58:17.663232252Z at org.apache.hadoop.ipc.Client$Connection.receiveRpcResponse(Client.java:1072)
radar-hdfs-connector_1 | 2017-08-08T10:58:17.663239095Z at org.apache.hadoop.ipc.Client$Connection.run(Client.java:967)
radar-hdfs-connector_1 | 2017-08-08T10:58:17.663245740Z [2017-08-08 10:58:17,662] ERROR Task is being killed and will not recover until manually restarted (org.apache.kafka.connect.runtime.WorkerTask)
radar-hdfs-connector_1 | 2017-08-08T10:58:17.663252919Z [2017-08-08 10:58:17,661] ERROR Task is being killed and will not recover until manually restarted (org.apache.kafka.connect.runtime.WorkerTask)
radar-hdfs-connector_1 | 2017-08-08T10:58:17.664754302Z at org.apache.hadoop.hdfs.LeaseRenewer$Factory.remove(LeaseRenewer.java:154)
radar-hdfs-connector_1 | 2017-08-08T10:58:17.664857528Z at org.apache.hadoop.hdfs.LeaseRenewer$Factory.access$800(LeaseRenewer.java:90)
radar-hdfs-connector_1 | 2017-08-08T10:58:17.664878000Z at org.apache.hadoop.hdfs.LeaseRenewer$1.run(LeaseRenewer.java:312)
radar-hdfs-connector_1 | 2017-08-08T10:58:17.664966845Z at java.lang.Thread.run(Thread.java:745)
radar-hdfs-connector_1 | 2017-08-08T10:58:17.665451043Z java.lang.OutOfMemoryError: Java heap space
radar-hdfs-connector_1 | 2017-08-08T10:58:17.674685502Z [2017-08-08 10:58:17,672] ERROR Recovery failed at state RECOVERY_PARTITION_PAUSED (io.confluent.connect.hdfs.TopicPartitionWriter)
radar-hdfs-connector_1 | 2017-08-08T10:58:17.674757976Z org.apache.kafka.connect.errors.ConnectException: java.io.IOException: Failed on local exception: java.io.IOException: Error reading responses; Host Details : local host is: "6d3d8843d9b9/172.18.0.5"; destination host is: "hdfs-namenode":8020;
radar-hdfs-connector_1 | 2017-08-08T10:58:17.674775466Z at io.confluent.connect.hdfs.wal.FSWAL.apply(FSWAL.java:131)
radar-hdfs-connector_1 | 2017-08-08T10:58:17.674786782Z at io.confluent.connect.hdfs.TopicPartitionWriter.applyWAL(TopicPartitionWriter.java:484)
radar-hdfs-connector_1 | 2017-08-08T10:58:17.674797842Z at io.confluent.connect.hdfs.TopicPartitionWriter.recover(TopicPartitionWriter.java:212)
radar-hdfs-connector_1 | 2017-08-08T10:58:17.674809804Z at io.confluent.connect.hdfs.DataWriter.recover(DataWriter.java:239)
radar-hdfs-connector_1 | 2017-08-08T10:58:17.674818974Z at io.confluent.connect.hdfs.DataWriter.open(DataWriter.java:281)
radar-hdfs-connector_1 | 2017-08-08T10:58:17.674828847Z at io.confluent.connect.hdfs.HdfsSinkTask.open(HdfsSinkTask.java:116)
radar-hdfs-connector_1 | 2017-08-08T10:58:17.674838684Z at org.apache.kafka.connect.runtime.WorkerSinkTask.openPartitions(WorkerSinkTask.java:431)
radar-hdfs-connector_1 | 2017-08-08T10:58:17.674848146Z at org.apache.kafka.connect.runtime.WorkerSinkTask.access$1000(WorkerSinkTask.java:55)
radar-hdfs-connector_1 | 2017-08-08T10:58:17.674904030Z at org.apache.kafka.connect.runtime.WorkerSinkTask$HandleRebalance.onPartitionsAssigned(WorkerSinkTask.java:467)
radar-hdfs-connector_1 | 2017-08-08T10:58:17.674915370Z at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.onJoinComplete(ConsumerCoordinator.java:228)
radar-hdfs-connector_1 | 2017-08-08T10:58:17.674925666Z at org.apache.kafka.clients.consumer.internals.AbstractCoordinator.joinGroupIfNeeded(AbstractCoordinator.java:313)
radar-hdfs-connector_1 | 2017-08-08T10:58:17.674935125Z at org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureActiveGroup(AbstractCoordinator.java:277)
radar-hdfs-connector_1 | 2017-08-08T10:58:17.674944293Z at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.poll(ConsumerCoordinator.java:259)
radar-hdfs-connector_1 | 2017-08-08T10:58:17.674953437Z at org.apache.kafka.clients.consumer.KafkaConsumer.pollOnce(KafkaConsumer.java:1013)
radar-hdfs-connector_1 | 2017-08-08T10:58:17.674962779Z at org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:979)
radar-hdfs-connector_1 | 2017-08-08T10:58:17.674972112Z at org.apache.kafka.connect.runtime.WorkerSinkTask.pollConsumer(WorkerSinkTask.java:317)
radar-hdfs-connector_1 | 2017-08-08T10:58:17.674987616Z at org.apache.kafka.connect.runtime.WorkerSinkTask.poll(WorkerSinkTask.java:235)
radar-hdfs-connector_1 | 2017-08-08T10:58:17.675016059Z at org.apache.kafka.connect.runtime.WorkerSinkTask.iteration(WorkerSinkTask.java:172)
radar-hdfs-connector_1 | 2017-08-08T10:58:17.675031307Z at org.apache.kafka.connect.runtime.WorkerSinkTask.execute(WorkerSinkTask.java:143)
radar-hdfs-connector_1 | 2017-08-08T10:58:17.675040607Z at org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:140)
radar-hdfs-connector_1 | 2017-08-08T10:58:17.675049323Z at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:175)
radar-hdfs-connector_1 | 2017-08-08T10:58:17.675057743Z at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
radar-hdfs-connector_1 | 2017-08-08T10:58:17.675082478Z at java.util.concurrent.FutureTask.run(FutureTask.java:266)
radar-hdfs-connector_1 | 2017-08-08T10:58:17.675091712Z at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
radar-hdfs-connector_1 | 2017-08-08T10:58:17.675100571Z at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
radar-hdfs-connector_1 | 2017-08-08T10:58:17.675115445Z at java.lang.Thread.run(Thread.java:745)
radar-hdfs-connector_1 | 2017-08-08T10:58:17.675481141Z Caused by: java.io.IOException: Failed on local exception: java.io.IOException: Error reading responses; Host Details : local host is: "6d3d8843d9b9/172.18.0.5"; destination host is: "hdfs-namenode":8020;
radar-hdfs-connector_1 | 2017-08-08T10:58:17.675522787Z at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:772)
radar-hdfs-connector_1 | 2017-08-08T10:58:17.675531882Z at org.apache.hadoop.ipc.Client.call(Client.java:1473)
radar-hdfs-connector_1 | 2017-08-08T10:58:17.675540120Z at org.apache.hadoop.ipc.Client.call(Client.java:1400)
radar-hdfs-connector_1 | 2017-08-08T10:58:17.675548463Z at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232)
radar-hdfs-connector_1 | 2017-08-08T10:58:17.675557739Z at com.sun.proxy.$Proxy47.getFileInfo(Unknown Source)
radar-hdfs-connector_1 | 2017-08-08T10:58:17.675566079Z at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:752)
radar-hdfs-connector_1 | 2017-08-08T10:58:17.675574527Z at sun.reflect.GeneratedMethodAccessor1.invoke(Unknown Source)
radar-hdfs-connector_1 | 2017-08-08T10:58:17.675582693Z at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
radar-hdfs-connector_1 | 2017-08-08T10:58:17.675590784Z at java.lang.reflect.Method.invoke(Method.java:498)
radar-hdfs-connector_1 | 2017-08-08T10:58:17.675598809Z at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
radar-hdfs-connector_1 | 2017-08-08T10:58:17.675607116Z at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
radar-hdfs-connector_1 | 2017-08-08T10:58:17.675616695Z at com.sun.proxy.$Proxy48.getFileInfo(Unknown Source)
radar-hdfs-connector_1 | 2017-08-08T10:58:17.675624462Z at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1977)
radar-hdfs-connector_1 | 2017-08-08T10:58:17.675632361Z at org.apache.hadoop.hdfs.DistributedFileSystem$18.doCall(DistributedFileSystem.java:1118)
radar-hdfs-connector_1 | 2017-08-08T10:58:17.675640551Z at org.apache.hadoop.hdfs.DistributedFileSystem$18.doCall(DistributedFileSystem.java:1114)
radar-hdfs-connector_1 | 2017-08-08T10:58:17.675648690Z at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
radar-hdfs-connector_1 | 2017-08-08T10:58:17.675669801Z at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1114)
radar-hdfs-connector_1 | 2017-08-08T10:58:17.675678825Z at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1400)
radar-hdfs-connector_1 | 2017-08-08T10:58:17.675687171Z at io.confluent.connect.hdfs.storage.HdfsStorage.exists(HdfsStorage.java:66)
radar-hdfs-connector_1 | 2017-08-08T10:58:17.675695703Z at io.confluent.connect.hdfs.wal.FSWAL.apply(FSWAL.java:120)
radar-hdfs-connector_1 | 2017-08-08T10:58:17.675703914Z ... 25 more
radar-hdfs-connector_1 | 2017-08-08T10:58:17.675733556Z Caused by: java.io.IOException: Error reading responses
radar-hdfs-connector_1 | 2017-08-08T10:58:17.675742644Z at org.apache.hadoop.ipc.Client$Connection.run(Client.java:974)
radar-hdfs-connector_1 | 2017-08-08T10:58:17.675750558Z Caused by: java.lang.OutOfMemoryError: Java heap space
radar-hdfs-connector_1 | 2017-08-08T10:58:17.675758148Z at java.lang.Integer.valueOf(Integer.java:832)
radar-hdfs-connector_1 | 2017-08-08T10:58:17.675766979Z at sun.nio.ch.EPollSelectorImpl.updateSelectedKeys(EPollSelectorImpl.java:120)
radar-hdfs-connector_1 | 2017-08-08T10:58:17.675826511Z at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:98)
radar-hdfs-connector_1 | 2017-08-08T10:58:17.675846692Z at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86)
radar-hdfs-connector_1 | 2017-08-08T10:58:17.675854309Z at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97)
radar-hdfs-connector_1 | 2017-08-08T10:58:17.675861362Z at org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool.select(SocketIOWithTimeout.java:335)
radar-hdfs-connector_1 | 2017-08-08T10:58:17.675868688Z at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:157)
radar-hdfs-connector_1 | 2017-08-08T10:58:17.675875810Z at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161)
radar-hdfs-connector_1 | 2017-08-08T10:58:17.675882983Z at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:131)
radar-hdfs-connector_1 | 2017-08-08T10:58:17.675890033Z at java.io.FilterInputStream.read(FilterInputStream.java:133)
radar-hdfs-connector_1 | 2017-08-08T10:58:17.675896759Z at java.io.FilterInputStream.read(FilterInputStream.java:133)
radar-hdfs-connector_1 | 2017-08-08T10:58:17.675912188Z at org.apache.hadoop.ipc.Client$Connection$PingInputStream.read(Client.java:514)
radar-hdfs-connector_1 | 2017-08-08T10:58:17.675919586Z at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
radar-hdfs-connector_1 | 2017-08-08T10:58:17.675926531Z at java.io.BufferedInputStream.read(BufferedInputStream.java:265)
radar-hdfs-connector_1 | 2017-08-08T10:58:17.675933227Z at java.io.DataInputStream.readInt(DataInputStream.java:387)
radar-hdfs-connector_1 | 2017-08-08T10:58:17.675940313Z at org.apache.hadoop.ipc.Client$Connection.receiveRpcResponse(Client.java:1072)
radar-hdfs-connector_1 | 2017-08-08T10:58:17.675947123Z at org.apache.hadoop.ipc.Client$Connection.run(Client.java:967)
radar-hdfs-connector_1 | 2017-08-08T10:58:17.675953636Z [2017-08-08 10:58:17,674] INFO Started recovery for topic partition application_uptime-0 (io.confluent.connect.hdfs.TopicPartitionWriter)
radar-hdfs-connector_1 | 2017-08-08T10:58:17.759939024Z [2017-08-08 10:58:17,759] INFO Discovered coordinator kafka-2:9092 (id: 2147483645 rack: null) for group connect-radar-hdfs-sink-android-15000. (org.apache.kafka.clients.consumer.internals.AbstractCoordinator)
radar-hdfs-connector_1 | 2017-08-08T10:58:18.790960667Z [2017-08-08 10:58:18,790] INFO Successfully acquired lease for hdfs://hdfs-namenode:8020/logs/application_uptime/0/log (io.confluent.connect.hdfs.wal.FSWAL)
radar-hdfs-connector_1 | 2017-08-08T10:58:21.222620804Z [2017-08-08 10:58:21,222] WARN DataStreamer Exception (org.apache.hadoop.hdfs.DFSClient)
radar-hdfs-connector_1 | 2017-08-08T10:58:21.222711059Z java.io.IOException: Failed to replace a bad datanode on the existing pipeline due to no more good datanodes being available to try. (Nodes: current=[172.18.0.4:50010, 172.18.0.2:50010], original=[172.18.0.4:50010, 172.18.0.2:50010]). The current failed datanode replacement policy is DEFAULT, and a client may configure this via 'dfs.client.block.write.replace-datanode-on-failure.policy' in its configuration.
radar-hdfs-connector_1 | 2017-08-08T10:58:21.222761069Z at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.findNewDatanode(DFSOutputStream.java:1040)
radar-hdfs-connector_1 | 2017-08-08T10:58:21.222770081Z at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.java:1106)
radar-hdfs-connector_1 | 2017-08-08T10:58:21.222778601Z at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1253)
radar-hdfs-connector_1 | 2017-08-08T10:58:21.222786062Z at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:594)
radar-hdfs-connector_1 | 2017-08-08T10:58:21.223887738Z [2017-08-08 10:58:21,222] ERROR Recovery failed at state WAL_APPLIED (io.confluent.connect.hdfs.TopicPartitionWriter)
radar-hdfs-connector_1 | 2017-08-08T10:58:21.224044328Z org.apache.kafka.connect.errors.ConnectException: Error closing hdfs://hdfs-namenode:8020/logs/application_uptime/0/log
radar-hdfs-connector_1 | 2017-08-08T10:58:21.224213471Z at io.confluent.connect.hdfs.wal.FSWAL.close(FSWAL.java:161)
radar-hdfs-connector_1 | 2017-08-08T10:58:21.224240225Z at io.confluent.connect.hdfs.wal.FSWAL.truncate(FSWAL.java:143)
radar-hdfs-connector_1 | 2017-08-08T10:58:21.224315470Z at io.confluent.connect.hdfs.TopicPartitionWriter.truncateWAL(TopicPartitionWriter.java:490)
radar-hdfs-connector_1 | 2017-08-08T10:58:21.224339287Z at io.confluent.connect.hdfs.TopicPartitionWriter.recover(TopicPartitionWriter.java:215)
radar-hdfs-connector_1 | 2017-08-08T10:58:21.224405665Z at io.confluent.connect.hdfs.DataWriter.recover(DataWriter.java:239)
radar-hdfs-connector_1 | 2017-08-08T10:58:21.224429416Z at io.confluent.connect.hdfs.DataWriter.open(DataWriter.java:281)
radar-hdfs-connector_1 | 2017-08-08T10:58:21.224496283Z at io.confluent.connect.hdfs.HdfsSinkTask.open(HdfsSinkTask.java:116)
radar-hdfs-connector_1 | 2017-08-08T10:58:21.224517864Z at org.apache.kafka.connect.runtime.WorkerSinkTask.openPartitions(WorkerSinkTask.java:431)
radar-hdfs-connector_1 | 2017-08-08T10:58:21.224668545Z at org.apache.kafka.connect.runtime.WorkerSinkTask.access$1000(WorkerSinkTask.java:55)
radar-hdfs-connector_1 | 2017-08-08T10:58:21.224702481Z at org.apache.kafka.connect.runtime.WorkerSinkTask$HandleRebalance.onPartitionsAssigned(WorkerSinkTask.java:467)
radar-hdfs-connector_1 | 2017-08-08T10:58:21.224825771Z at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.onJoinComplete(ConsumerCoordinator.java:228)
radar-hdfs-connector_1 | 2017-08-08T10:58:21.224863913Z at org.apache.kafka.clients.consumer.internals.AbstractCoordinator.joinGroupIfNeeded(AbstractCoordinator.java:313)
radar-hdfs-connector_1 | 2017-08-08T10:58:21.224968463Z at org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureActiveGroup(AbstractCoordinator.java:277)
radar-hdfs-connector_1 | 2017-08-08T10:58:21.225007628Z at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.poll(ConsumerCoordinator.java:259)
radar-hdfs-connector_1 | 2017-08-08T10:58:21.225115954Z at org.apache.kafka.clients.consumer.KafkaConsumer.pollOnce(KafkaConsumer.java:1013)
radar-hdfs-connector_1 | 2017-08-08T10:58:21.225144781Z at org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:979)
radar-hdfs-connector_1 | 2017-08-08T10:58:21.225238780Z at org.apache.kafka.connect.runtime.WorkerSinkTask.pollConsumer(WorkerSinkTask.java:317)
radar-hdfs-connector_1 | 2017-08-08T10:58:21.225334044Z at org.apache.kafka.connect.runtime.WorkerSinkTask.poll(WorkerSinkTask.java:235)
radar-hdfs-connector_1 | 2017-08-08T10:58:21.225385164Z at org.apache.kafka.connect.runtime.WorkerSinkTask.iteration(WorkerSinkTask.java:172)
radar-hdfs-connector_1 | 2017-08-08T10:58:21.225481965Z at org.apache.kafka.connect.runtime.WorkerSinkTask.execute(WorkerSinkTask.java:143)
radar-hdfs-connector_1 | 2017-08-08T10:58:21.225539996Z at org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:140)
radar-hdfs-connector_1 | 2017-08-08T10:58:21.225635045Z at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:175)
radar-hdfs-connector_1 | 2017-08-08T10:58:21.225662942Z at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
radar-hdfs-connector_1 | 2017-08-08T10:58:21.225752911Z at java.util.concurrent.FutureTask.run(FutureTask.java:266)
radar-hdfs-connector_1 | 2017-08-08T10:58:21.225781015Z at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
radar-hdfs-connector_1 | 2017-08-08T10:58:21.225889033Z at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
radar-hdfs-connector_1 | 2017-08-08T10:58:21.225918844Z at java.lang.Thread.run(Thread.java:745)
radar-hdfs-connector_1 | 2017-08-08T10:58:21.226067886Z Caused by: java.io.IOException: Failed to replace a bad datanode on the existing pipeline due to no more good datanodes being available to try. (Nodes: current=[172.18.0.4:50010, 172.18.0.2:50010], original=[172.18.0.4:50010, 172.18.0.2:50010]). The current failed datanode replacement policy is DEFAULT, and a client may configure this via 'dfs.client.block.write.replace-datanode-on-failure.policy' in its configuration.
radar-hdfs-connector_1 | 2017-08-08T10:58:21.226090089Z at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.findNewDatanode(DFSOutputStream.java:1040)
radar-hdfs-connector_1 | 2017-08-08T10:58:21.226098765Z at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.java:1106)
radar-hdfs-connector_1 | 2017-08-08T10:58:21.226105605Z at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1253)
radar-hdfs-connector_1 | 2017-08-08T10:58:21.226112853Z at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:594)
radar-hdfs-connector_1 | 2017-08-08T10:58:21.226119549Z [2017-08-08 10:58:21,223] INFO Started recovery for topic partition android_biovotion_battery_state-0 (io.confluent.connect.hdfs.TopicPartitionWriter)
radar-hdfs-connector_1 | 2017-08-08T10:58:21.227944280Z [2017-08-08 10:58:21,227] INFO Finished recovery for topic partition android_biovotion_battery_state-0 (io.confluent.connect.hdfs.TopicPartitionWriter)
radar-hdfs-connector_1 | 2017-08-08T10:58:21.227960824Z [2017-08-08 10:58:21,227] INFO Started recovery for topic partition android_biovotion_spo2-0 (io.confluent.connect.hdfs.TopicPartitionWriter)
radar-hdfs-connector_1 | 2017-08-08T10:58:21.242479443Z [2017-08-08 10:58:21,242] INFO Finished recovery for topic partition android_biovotion_spo2-0 (io.confluent.connect.hdfs.TopicPartitionWriter)
radar-hdfs-connector_1 | 2017-08-08T10:58:21.242519511Z [2017-08-08 10:58:21,242] INFO Started recovery for topic partition android_empatica_e4_blood_volume_pulse-0 (io.confluent.connect.hdfs.TopicPartitionWriter)
radar-hdfs-connector_1 | 2017-08-08T10:58:43.594051287Z [2017-08-08 10:58:43,593] ERROR Task radar-hdfs-sink-android-15000-1 threw an uncaught and unrecoverable exception (org.apache.kafka.connect.runtime.WorkerTask)
radar-hdfs-connector_1 | 2017-08-08T10:58:43.594178139Z java.lang.NullPointerException
radar-hdfs-connector_1 | 2017-08-08T10:58:43.594321199Z at io.confluent.connect.hdfs.DataWriter.close(DataWriter.java:296)
radar-hdfs-connector_1 | 2017-08-08T10:58:43.594369256Z at io.confluent.connect.hdfs.HdfsSinkTask.close(HdfsSinkTask.java:121)
radar-hdfs-connector_1 | 2017-08-08T10:58:43.594435121Z at org.apache.kafka.connect.runtime.WorkerSinkTask.commitOffsets(WorkerSinkTask.java:302)
radar-hdfs-connector_1 | 2017-08-08T10:58:43.594458350Z at org.apache.kafka.connect.runtime.WorkerSinkTask.closePartitions(WorkerSinkTask.java:435)
radar-hdfs-connector_1 | 2017-08-08T10:58:43.594512221Z at org.apache.kafka.connect.runtime.WorkerSinkTask.execute(WorkerSinkTask.java:147)
radar-hdfs-connector_1 | 2017-08-08T10:58:43.594533581Z at org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:140)
radar-hdfs-connector_1 | 2017-08-08T10:58:43.594585291Z at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:175)
radar-hdfs-connector_1 | 2017-08-08T10:58:43.594606048Z at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
radar-hdfs-connector_1 | 2017-08-08T10:58:43.594667725Z at java.util.concurrent.FutureTask.run(FutureTask.java:266)
radar-hdfs-connector_1 | 2017-08-08T10:58:43.594689178Z at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
radar-hdfs-connector_1 | 2017-08-08T10:58:43.594744402Z at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
radar-hdfs-connector_1 | 2017-08-08T10:58:43.594765654Z at java.lang.Thread.run(Thread.java:745)
radar-hdfs-connector_1 | 2017-08-08T10:58:43.594815858Z [2017-08-08 10:58:43,593] ERROR Task is being killed and will not recover until manually restarted (org.apache.kafka.connect.runtime.WorkerTask)
radar-hdfs-connector_1 | 2017-08-08T10:59:09.680815652Z Exception in thread "LeaseRenewer:root@hdfs-namenode:8020" java.lang.OutOfMemoryError: Java heap space
radar-hdfs-connector_1 | 2017-08-08T10:59:09.684064012Z [2017-08-08 10:59:09,683] WARN DataStreamer Exception (org.apache.hadoop.hdfs.DFSClient)
radar-hdfs-connector_1 | 2017-08-08T10:59:09.684121728Z java.io.IOException: Failed to replace a bad datanode on the existing pipeline due to no more good datanodes being available to try. (Nodes: current=[172.18.0.2:50010, 172.18.0.4:50010], original=[172.18.0.2:50010, 172.18.0.4:50010]). The current failed datanode replacement policy is DEFAULT, and a client may configure this via 'dfs.client.block.write.replace-datanode-on-failure.policy' in its configuration.
radar-hdfs-connector_1 | 2017-08-08T10:59:09.684134622Z at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.findNewDatanode(DFSOutputStream.java:1040)
radar-hdfs-connector_1 | 2017-08-08T10:59:09.684142125Z at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.java:1106)
radar-hdfs-connector_1 | 2017-08-08T10:59:09.684149445Z at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1253)
radar-hdfs-connector_1 | 2017-08-08T10:59:09.684156588Z at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:594)
radar-hdfs-connector_1 | 2017-08-08T10:59:09.685040766Z [2017-08-08 10:59:09,684] ERROR Error closing hdfs://hdfs-namenode:8020/logs/android_empatica_e4_acceleration/0/log. (io.confluent.connect.hdfs.TopicPartitionWriter)
radar-hdfs-connector_1 | 2017-08-08T10:59:09.685060542Z org.apache.kafka.connect.errors.ConnectException: Error closing hdfs://hdfs-namenode:8020/logs/android_empatica_e4_acceleration/0/log
radar-hdfs-connector_1 | 2017-08-08T10:59:09.685069086Z at io.confluent.connect.hdfs.wal.FSWAL.close(FSWAL.java:161)
radar-hdfs-connector_1 | 2017-08-08T10:59:09.685096253Z at io.confluent.connect.hdfs.TopicPartitionWriter.close(TopicPartitionWriter.java:372)
radar-hdfs-connector_1 | 2017-08-08T10:59:09.685104128Z at io.confluent.connect.hdfs.DataWriter.close(DataWriter.java:296)
radar-hdfs-connector_1 | 2017-08-08T10:59:09.685114217Z at io.confluent.connect.hdfs.HdfsSinkTask.close(HdfsSinkTask.java:121)
radar-hdfs-connector_1 | 2017-08-08T10:59:09.685121639Z at org.apache.kafka.connect.runtime.WorkerSinkTask.commitOffsets(WorkerSinkTask.java:302)
radar-hdfs-connector_1 | 2017-08-08T10:59:09.685128947Z at org.apache.kafka.connect.runtime.WorkerSinkTask.closePartitions(WorkerSinkTask.java:435)
radar-hdfs-connector_1 | 2017-08-08T10:59:09.685135976Z at org.apache.kafka.connect.runtime.WorkerSinkTask.execute(WorkerSinkTask.java:147)
radar-hdfs-connector_1 | 2017-08-08T10:59:09.685143131Z at org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:140)
radar-hdfs-connector_1 | 2017-08-08T10:59:09.685150382Z at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:175)
radar-hdfs-connector_1 | 2017-08-08T10:59:09.685156655Z at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
radar-hdfs-connector_1 | 2017-08-08T10:59:09.685163099Z at java.util.concurrent.FutureTask.run(FutureTask.java:266)
radar-hdfs-connector_1 | 2017-08-08T10:59:09.685169246Z at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
radar-hdfs-connector_1 | 2017-08-08T10:59:09.685175843Z at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
radar-hdfs-connector_1 | 2017-08-08T10:59:09.685182428Z at java.lang.Thread.run(Thread.java:745)
radar-hdfs-connector_1 | 2017-08-08T10:59:09.685188941Z Caused by: java.io.IOException: Failed to replace a bad datanode on the existing pipeline due to no more good datanodes being available to try. (Nodes: current=[172.18.0.2:50010, 172.18.0.4:50010], original=[172.18.0.2:50010, 172.18.0.4:50010]). The current failed datanode replacement policy is DEFAULT, and a client may configure this via 'dfs.client.block.write.replace-datanode-on-failure.policy' in its configuration.
radar-hdfs-connector_1 | 2017-08-08T10:59:09.685197143Z at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.findNewDatanode(DFSOutputStream.java:1040)
radar-hdfs-connector_1 | 2017-08-08T10:59:09.685203794Z at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.java:1106)
radar-hdfs-connector_1 | 2017-08-08T10:59:09.685211285Z at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1253)
radar-hdfs-connector_1 | 2017-08-08T10:59:09.685218128Z at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:594)
radar-hdfs-connector_1 | 2017-08-08T10:59:09.685886612Z [2017-08-08 10:59:09,685] ERROR Error closing writer for android_empatica_e4_acceleration-0. Error: Error closing writer: Error closing hdfs://hdfs-namenode:8020/logs/android_empatica_e4_acceleration/0/log
radar-hdfs-connector_1 | 2017-08-08T10:59:09.685900880Z (io.confluent.connect.hdfs.DataWriter)
radar-hdfs-connector_1 | 2017-08-08T10:59:09.686436108Z [2017-08-08 10:59:09,685] ERROR Error closing hdfs://hdfs-namenode:8020/logs/application_uptime/0/log. (io.confluent.connect.hdfs.TopicPartitionWriter)
radar-hdfs-connector_1 | 2017-08-08T10:59:09.686449148Z org.apache.kafka.connect.errors.ConnectException: Error closing hdfs://hdfs-namenode:8020/logs/application_uptime/0/log
radar-hdfs-connector_1 | 2017-08-08T10:59:09.686457044Z at io.confluent.connect.hdfs.wal.FSWAL.close(FSWAL.java:161)
radar-hdfs-connector_1 | 2017-08-08T10:59:09.686470374Z at io.confluent.connect.hdfs.TopicPartitionWriter.close(TopicPartitionWriter.java:372)
radar-hdfs-connector_1 | 2017-08-08T10:59:09.686477672Z at io.confluent.connect.hdfs.DataWriter.close(DataWriter.java:296)
radar-hdfs-connector_1 | 2017-08-08T10:59:09.686484204Z at io.confluent.connect.hdfs.HdfsSinkTask.close(HdfsSinkTask.java:121)
radar-hdfs-connector_1 | 2017-08-08T10:59:09.686490698Z at org.apache.kafka.connect.runtime.WorkerSinkTask.commitOffsets(WorkerSinkTask.java:302)
radar-hdfs-connector_1 | 2017-08-08T10:59:09.686497253Z at org.apache.kafka.connect.runtime.WorkerSinkTask.closePartitions(WorkerSinkTask.java:435)
radar-hdfs-connector_1 | 2017-08-08T10:59:09.686503670Z at org.apache.kafka.connect.runtime.WorkerSinkTask.execute(WorkerSinkTask.java:147)
radar-hdfs-connector_1 | 2017-08-08T10:59:09.686510252Z at org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:140)
radar-hdfs-connector_1 | 2017-08-08T10:59:09.686516660Z at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:175)
radar-hdfs-connector_1 | 2017-08-08T10:59:09.686523122Z at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
radar-hdfs-connector_1 | 2017-08-08T10:59:09.686529473Z at java.util.concurrent.FutureTask.run(FutureTask.java:266)
radar-hdfs-connector_1 | 2017-08-08T10:59:09.686535836Z at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
radar-hdfs-connector_1 | 2017-08-08T10:59:09.686542115Z at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
radar-hdfs-connector_1 | 2017-08-08T10:59:09.686548634Z at java.lang.Thread.run(Thread.java:745)
radar-hdfs-connector_1 | 2017-08-08T10:59:09.686554892Z Caused by: java.io.IOException: Failed to replace a bad datanode on the existing pipeline due to no more good datanodes being available to try. (Nodes: current=[172.18.0.4:50010, 172.18.0.2:50010], original=[172.18.0.4:50010, 172.18.0.2:50010]). The current failed datanode replacement policy is DEFAULT, and a client may configure this via 'dfs.client.block.write.replace-datanode-on-failure.policy' in its configuration.
radar-hdfs-connector_1 | 2017-08-08T10:59:09.686563037Z at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.findNewDatanode(DFSOutputStream.java:1040)
radar-hdfs-connector_1 | 2017-08-08T10:59:09.686569484Z at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.java:1106)
radar-hdfs-connector_1 | 2017-08-08T10:59:09.686577152Z at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1253)
radar-hdfs-connector_1 | 2017-08-08T10:59:09.686584046Z at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:594)
radar-hdfs-connector_1 | 2017-08-08T10:59:09.686590655Z [2017-08-08 10:59:09,686] ERROR Error closing writer for application_uptime-0. Error: Error closing writer: Error closing hdfs://hdfs-namenode:8020/logs/application_uptime/0/log
radar-hdfs-connector_1 | 2017-08-08T10:59:09.686597558Z (io.confluent.connect.hdfs.DataWriter)
radar-hdfs-connector_1 | 2017-08-08T10:59:09.687585226Z [2017-08-08 10:59:09,687] ERROR Task radar-hdfs-sink-android-15000-0 threw an uncaught and unrecoverable exception (org.apache.kafka.connect.runtime.WorkerTask)
radar-hdfs-connector_1 | 2017-08-08T10:59:09.687599056Z java.lang.NullPointerException
radar-hdfs-connector_1 | 2017-08-08T10:59:09.687606712Z at io.confluent.connect.hdfs.DataWriter.close(DataWriter.java:296)
radar-hdfs-connector_1 | 2017-08-08T10:59:09.687613465Z at io.confluent.connect.hdfs.HdfsSinkTask.close(HdfsSinkTask.java:121)
radar-hdfs-connector_1 | 2017-08-08T10:59:09.687626278Z at org.apache.kafka.connect.runtime.WorkerSinkTask.commitOffsets(WorkerSinkTask.java:302)
radar-hdfs-connector_1 | 2017-08-08T10:59:09.687633619Z at org.apache.kafka.connect.runtime.WorkerSinkTask.closePartitions(WorkerSinkTask.java:435)
radar-hdfs-connector_1 | 2017-08-08T10:59:09.687640233Z at org.apache.kafka.connect.runtime.WorkerSinkTask.execute(WorkerSinkTask.java:147)
radar-hdfs-connector_1 | 2017-08-08T10:59:09.687646783Z at org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:140)
radar-hdfs-connector_1 | 2017-08-08T10:59:09.687653293Z at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:175)
radar-hdfs-connector_1 | 2017-08-08T10:59:09.687659713Z at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
radar-hdfs-connector_1 | 2017-08-08T10:59:09.687666231Z at java.util.concurrent.FutureTask.run(FutureTask.java:266)
radar-hdfs-connector_1 | 2017-08-08T10:59:09.687672577Z at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
radar-hdfs-connector_1 | 2017-08-08T10:59:09.687679296Z at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
radar-hdfs-connector_1 | 2017-08-08T10:59:09.687685641Z at java.lang.Thread.run(Thread.java:745)
radar-hdfs-connector_1 | 2017-08-08T10:59:09.687691950Z [2017-08-08 10:59:09,687] ERROR Task is being killed and will not recover until manually restarted (org.apache.kafka.connect.runtime.WorkerTask)
radar-hdfs-connector_1 | 2017-08-08T10:59:10.190611936Z [2017-08-08 10:59:10,190] INFO Reflections took 103976 ms to scan 265 urls, producing 12687 keys and 83909 values (org.reflections.Reflections)
looks to me like a memory issue? Could this be related or is this a different issue?
OK I am now thoroughly confused. I had another closer look at the files in hdfs, specifically the newest for the acceleration topic (since there are now new ones after I restarted the container), and converted them to json like above, and noticed that the timestamps were in the past, and actually from later that night after the supposed missing data started.
See below a newly created avro file from today, the last json record in that file, and the timestamp of that record:
hdfs: -rw-r--r-- 3 root supergroup 9500 2017-08-08 16:06 /topicAndroidNew/android_empatica_e4_acceleration/partition=2/android_empatica_e4_acceleration+2+0021130772+0021130921.avro
json: {"key":{"userId":"UKLFRB2","sourceId":"00:07:80:F2:1A:43"},"value":{"time":1.5017942087017004E9,"timeReceived":1.501794206716E9,"x":-0.8125,"y":-0.59375,"z":-0.03125}}
timeReceived: 08/03/2017 @ 9:03pm (UTC)
Same thing for one of the files I mentioned in an earlier https://github.com/RADAR-CNS/Restructure-HDFS-topic/issues/6#issuecomment-320730406:
hdfs: -rw-r--r-- 3 root supergroup 9500 2017-08-04 01:25 /topicAndroidNew/android_empatica_e4_acceleration/partition=2/android_empatica_e4_acceleration+2+0014425524+0014425673.avro
json: {"key":{"userId":"UKLFRB2","sourceId":"00:07:80:F2:1A:43"},"value":{"time":1.501706715380344E9,"timeReceived":1.501706712389E9,"x":-0.015625,"y":-0.328125,"z":0.921875}}
timeReceived: 08/02/2017 @ 8:45pm (UTC)
So to me it currently looks like the connector is slowly getting the data now, almost as if the data arrives as if it were currently collecting. Is this normal that it takes the connector very long to go through kafka and store the data? I have now disconnected all empaticas again and the connector is still committing data. Will see tomorrow if it finished. My working theory is that somewhere along the line there was a problem with one of the partitions and that was now "fixed" by restarting the connector and it is now trying to catch up...
Yes, kafka will leave all data in its log for a while, to allow consumers to catch up if they get behind. If the fall too far behind though, kafka will remove the data altogether. The default time for this is set to 168 hours = 1 week (see Apache Kafka configuration docs, log.retention.<x>
). If a connector falls structurally behind (for more than a week), you may loose data though.
Since the retention is set to 1 week, I need to get this data very soon then because the missing section of data is from almost a week ago. Or is there a way to set retention time afterwards?
So this morning a bit more data has been written but still not everything that is missing. Instead the connector now gives exceptions like this, even after restarting the container:
radar-hdfs-connector_1 | 2017-08-09T08:05:06.103581674Z [2017-08-09 08:05:06,102] ERROR Recovery failed at state RECOVERY_PARTITION_PAUSED (io.confluent.connect.hdfs.TopicPartitionWriter)
radar-hdfs-connector_1 | 2017-08-09T08:05:06.103679462Z org.apache.kafka.connect.errors.ConnectException: Cannot acquire lease after timeout, will retry.
radar-hdfs-connector_1 | 2017-08-09T08:05:06.103697513Z at io.confluent.connect.hdfs.wal.FSWAL.acquireLease(FSWAL.java:95)
radar-hdfs-connector_1 | 2017-08-09T08:05:06.103707956Z at io.confluent.connect.hdfs.wal.FSWAL.apply(FSWAL.java:105)
radar-hdfs-connector_1 | 2017-08-09T08:05:06.103716755Z at io.confluent.connect.hdfs.TopicPartitionWriter.applyWAL(TopicPartitionWriter.java:484)
radar-hdfs-connector_1 | 2017-08-09T08:05:06.103725341Z at io.confluent.connect.hdfs.TopicPartitionWriter.recover(TopicPartitionWriter.java:212)
radar-hdfs-connector_1 | 2017-08-09T08:05:06.103734395Z at io.confluent.connect.hdfs.DataWriter.recover(DataWriter.java:239)
radar-hdfs-connector_1 | 2017-08-09T08:05:06.103743716Z at io.confluent.connect.hdfs.DataWriter.open(DataWriter.java:281)
radar-hdfs-connector_1 | 2017-08-09T08:05:06.103753421Z at io.confluent.connect.hdfs.HdfsSinkTask.open(HdfsSinkTask.java:116)
radar-hdfs-connector_1 | 2017-08-09T08:05:06.103762031Z at org.apache.kafka.connect.runtime.WorkerSinkTask.openPartitions(WorkerSinkTask.java:431)
radar-hdfs-connector_1 | 2017-08-09T08:05:06.103770955Z at org.apache.kafka.connect.runtime.WorkerSinkTask.access$1000(WorkerSinkTask.java:55)
radar-hdfs-connector_1 | 2017-08-09T08:05:06.103779925Z at org.apache.kafka.connect.runtime.WorkerSinkTask$HandleRebalance.onPartitionsAssigned(WorkerSinkTask.java:467)
radar-hdfs-connector_1 | 2017-08-09T08:05:06.103789981Z at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.onJoinComplete(ConsumerCoordinator.java:228)
radar-hdfs-connector_1 | 2017-08-09T08:05:06.103799897Z at org.apache.kafka.clients.consumer.internals.AbstractCoordinator.joinGroupIfNeeded(AbstractCoordinator.java:313)
radar-hdfs-connector_1 | 2017-08-09T08:05:06.104201839Z at org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureActiveGroup(AbstractCoordinator.java:277)
radar-hdfs-connector_1 | 2017-08-09T08:05:06.104237809Z at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.poll(ConsumerCoordinator.java:259)
radar-hdfs-connector_1 | 2017-08-09T08:05:06.104316058Z at org.apache.kafka.clients.consumer.KafkaConsumer.pollOnce(KafkaConsumer.java:1013)
radar-hdfs-connector_1 | 2017-08-09T08:05:06.104330866Z at org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:979)
radar-hdfs-connector_1 | 2017-08-09T08:05:06.104341432Z at org.apache.kafka.connect.runtime.WorkerSinkTask.pollConsumer(WorkerSinkTask.java:317)
radar-hdfs-connector_1 | 2017-08-09T08:05:06.104350723Z at org.apache.kafka.connect.runtime.WorkerSinkTask.poll(WorkerSinkTask.java:235)
radar-hdfs-connector_1 | 2017-08-09T08:05:06.104359762Z at org.apache.kafka.connect.runtime.WorkerSinkTask.iteration(WorkerSinkTask.java:172)
radar-hdfs-connector_1 | 2017-08-09T08:05:06.104369218Z at org.apache.kafka.connect.runtime.WorkerSinkTask.execute(WorkerSinkTask.java:143)
radar-hdfs-connector_1 | 2017-08-09T08:05:06.104381668Z at org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:140)
radar-hdfs-connector_1 | 2017-08-09T08:05:06.104391196Z at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:175)
radar-hdfs-connector_1 | 2017-08-09T08:05:06.104400040Z at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
radar-hdfs-connector_1 | 2017-08-09T08:05:06.104409157Z at java.util.concurrent.FutureTask.run(FutureTask.java:266)
radar-hdfs-connector_1 | 2017-08-09T08:05:06.104417605Z at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
radar-hdfs-connector_1 | 2017-08-09T08:05:06.104434756Z at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
radar-hdfs-connector_1 | 2017-08-09T08:05:06.104444413Z at java.lang.Thread.run(Thread.java:745)
radar-hdfs-connector_1 | 2017-08-09T08:05:39.180107105Z [2017-08-09 08:05:39,179] ERROR Recovery failed at state RECOVERY_PARTITION_PAUSED (io.confluent.connect.hdfs.TopicPartitionWriter)
radar-hdfs-connector_1 | 2017-08-09T08:05:39.180164855Z org.apache.kafka.connect.errors.ConnectException: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.protocol.RecoveryInProgressException): Failed to APPEND_FILE /logs/application_record_counts/1/log for DFSClient_NONMAPREDUCE_-1505132056_27 on 172.18.0.5 because lease recovery is in progress. Try again later.
radar-hdfs-connector_1 | 2017-08-09T08:05:39.180202772Z at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.recoverLeaseInternal(FSNamesystem.java:2918)
radar-hdfs-connector_1 | 2017-08-09T08:05:39.180213965Z at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.appendFileInternal(FSNamesystem.java:2683)
radar-hdfs-connector_1 | 2017-08-09T08:05:39.180223592Z at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.appendFileInt(FSNamesystem.java:2982)
radar-hdfs-connector_1 | 2017-08-09T08:05:39.180253706Z at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.appendFile(FSNamesystem.java:2950)
radar-hdfs-connector_1 | 2017-08-09T08:05:39.180263546Z at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.append(NameNodeRpcServer.java:654)
radar-hdfs-connector_1 | 2017-08-09T08:05:39.180272330Z at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.append(ClientNamenodeProtocolServerSideTranslatorPB.java:421)
radar-hdfs-connector_1 | 2017-08-09T08:05:39.180281153Z at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
radar-hdfs-connector_1 | 2017-08-09T08:05:39.180299276Z at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
radar-hdfs-connector_1 | 2017-08-09T08:05:39.180309263Z at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
radar-hdfs-connector_1 | 2017-08-09T08:05:39.180328643Z at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049)
radar-hdfs-connector_1 | 2017-08-09T08:05:39.180338174Z at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045)
radar-hdfs-connector_1 | 2017-08-09T08:05:39.180346589Z at java.security.AccessController.doPrivileged(Native Method)
radar-hdfs-connector_1 | 2017-08-09T08:05:39.180355016Z at javax.security.auth.Subject.doAs(Subject.java:422)
radar-hdfs-connector_1 | 2017-08-09T08:05:39.180363329Z at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
radar-hdfs-connector_1 | 2017-08-09T08:05:39.180371870Z at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2043)
radar-hdfs-connector_1 | 2017-08-09T08:05:39.180380057Z
radar-hdfs-connector_1 | 2017-08-09T08:05:39.180387965Z at io.confluent.connect.hdfs.wal.FSWAL.acquireLease(FSWAL.java:88)
radar-hdfs-connector_1 | 2017-08-09T08:05:39.180420971Z at io.confluent.connect.hdfs.wal.FSWAL.apply(FSWAL.java:105)
radar-hdfs-connector_1 | 2017-08-09T08:05:39.180430898Z at io.confluent.connect.hdfs.TopicPartitionWriter.applyWAL(TopicPartitionWriter.java:484)
radar-hdfs-connector_1 | 2017-08-09T08:05:39.180443378Z at io.confluent.connect.hdfs.TopicPartitionWriter.recover(TopicPartitionWriter.java:212)
radar-hdfs-connector_1 | 2017-08-09T08:05:39.180453374Z at io.confluent.connect.hdfs.DataWriter.recover(DataWriter.java:239)
radar-hdfs-connector_1 | 2017-08-09T08:05:39.180462566Z at io.confluent.connect.hdfs.DataWriter.open(DataWriter.java:281)
radar-hdfs-connector_1 | 2017-08-09T08:05:39.180482966Z at io.confluent.connect.hdfs.HdfsSinkTask.open(HdfsSinkTask.java:116)
radar-hdfs-connector_1 | 2017-08-09T08:05:39.180492953Z at org.apache.kafka.connect.runtime.WorkerSinkTask.openPartitions(WorkerSinkTask.java:431)
radar-hdfs-connector_1 | 2017-08-09T08:05:39.180501740Z at org.apache.kafka.connect.runtime.WorkerSinkTask.access$1000(WorkerSinkTask.java:55)
radar-hdfs-connector_1 | 2017-08-09T08:05:39.180510326Z at org.apache.kafka.connect.runtime.WorkerSinkTask$HandleRebalance.onPartitionsAssigned(WorkerSinkTask.java:467)
radar-hdfs-connector_1 | 2017-08-09T08:05:39.180526073Z at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.onJoinComplete(ConsumerCoordinator.java:228)
radar-hdfs-connector_1 | 2017-08-09T08:05:39.180555500Z at org.apache.kafka.clients.consumer.internals.AbstractCoordinator.joinGroupIfNeeded(AbstractCoordinator.java:313)
radar-hdfs-connector_1 | 2017-08-09T08:05:39.180566039Z at org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureActiveGroup(AbstractCoordinator.java:277)
radar-hdfs-connector_1 | 2017-08-09T08:05:39.180574784Z at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.poll(ConsumerCoordinator.java:259)
radar-hdfs-connector_1 | 2017-08-09T08:05:39.180583469Z at org.apache.kafka.clients.consumer.KafkaConsumer.pollOnce(KafkaConsumer.java:1013)
radar-hdfs-connector_1 | 2017-08-09T08:05:39.180592160Z at org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:979)
radar-hdfs-connector_1 | 2017-08-09T08:05:39.180600623Z at org.apache.kafka.connect.runtime.WorkerSinkTask.pollConsumer(WorkerSinkTask.java:317)
radar-hdfs-connector_1 | 2017-08-09T08:05:39.180609158Z at org.apache.kafka.connect.runtime.WorkerSinkTask.poll(WorkerSinkTask.java:235)
radar-hdfs-connector_1 | 2017-08-09T08:05:39.180617246Z at org.apache.kafka.connect.runtime.WorkerSinkTask.iteration(WorkerSinkTask.java:172)
radar-hdfs-connector_1 | 2017-08-09T08:05:39.180635710Z at org.apache.kafka.connect.runtime.WorkerSinkTask.execute(WorkerSinkTask.java:143)
radar-hdfs-connector_1 | 2017-08-09T08:05:39.180644812Z at org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:140)
radar-hdfs-connector_1 | 2017-08-09T08:05:39.180653678Z at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:175)
radar-hdfs-connector_1 | 2017-08-09T08:05:39.180661850Z at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
radar-hdfs-connector_1 | 2017-08-09T08:05:39.180670274Z at java.util.concurrent.FutureTask.run(FutureTask.java:266)
radar-hdfs-connector_1 | 2017-08-09T08:05:39.180678266Z at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
radar-hdfs-connector_1 | 2017-08-09T08:05:39.180689594Z at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
radar-hdfs-connector_1 | 2017-08-09T08:05:39.180707444Z at java.lang.Thread.run(Thread.java:745)
radar-hdfs-connector_1 | 2017-08-09T08:05:39.180718316Z Caused by: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.protocol.RecoveryInProgressException): Failed to APPEND_FILE /logs/application_record_counts/1/log for DFSClient_NONMAPREDUCE_-1505132056_27 on 172.18.0.5 because lease recovery is in progress. Try again later.
radar-hdfs-connector_1 | 2017-08-09T08:05:39.180728378Z at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.recoverLeaseInternal(FSNamesystem.java:2918)
radar-hdfs-connector_1 | 2017-08-09T08:05:39.180736994Z at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.appendFileInternal(FSNamesystem.java:2683)
radar-hdfs-connector_1 | 2017-08-09T08:05:39.180745568Z at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.appendFileInt(FSNamesystem.java:2982)
radar-hdfs-connector_1 | 2017-08-09T08:05:39.180754778Z at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.appendFile(FSNamesystem.java:2950)
radar-hdfs-connector_1 | 2017-08-09T08:05:39.180763072Z at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.append(NameNodeRpcServer.java:654)
radar-hdfs-connector_1 | 2017-08-09T08:05:39.180771494Z at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.append(ClientNamenodeProtocolServerSideTranslatorPB.java:421)
radar-hdfs-connector_1 | 2017-08-09T08:05:39.180796772Z at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
radar-hdfs-connector_1 | 2017-08-09T08:05:39.180806612Z at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
radar-hdfs-connector_1 | 2017-08-09T08:05:39.180815206Z at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
radar-hdfs-connector_1 | 2017-08-09T08:05:39.180823400Z at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049)
radar-hdfs-connector_1 | 2017-08-09T08:05:39.180831590Z at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045)
radar-hdfs-connector_1 | 2017-08-09T08:05:39.180839768Z at java.security.AccessController.doPrivileged(Native Method)
radar-hdfs-connector_1 | 2017-08-09T08:05:39.180847622Z at javax.security.auth.Subject.doAs(Subject.java:422)
radar-hdfs-connector_1 | 2017-08-09T08:05:39.180877268Z at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
radar-hdfs-connector_1 | 2017-08-09T08:05:39.180886388Z at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2043)
radar-hdfs-connector_1 | 2017-08-09T08:05:39.180894530Z
radar-hdfs-connector_1 | 2017-08-09T08:05:39.180902300Z at org.apache.hadoop.ipc.Client.call(Client.java:1469)
radar-hdfs-connector_1 | 2017-08-09T08:05:39.180910624Z at org.apache.hadoop.ipc.Client.call(Client.java:1400)
radar-hdfs-connector_1 | 2017-08-09T08:05:39.180918632Z at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232)
radar-hdfs-connector_1 | 2017-08-09T08:05:39.180937099Z at com.sun.proxy.$Proxy47.append(Unknown Source)
radar-hdfs-connector_1 | 2017-08-09T08:05:39.180946318Z at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.append(ClientNamenodeProtocolTranslatorPB.java:313)
radar-hdfs-connector_1 | 2017-08-09T08:05:39.180954961Z at sun.reflect.GeneratedMethodAccessor5.invoke(Unknown Source)
radar-hdfs-connector_1 | 2017-08-09T08:05:39.180964615Z at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
radar-hdfs-connector_1 | 2017-08-09T08:05:39.180973801Z at java.lang.reflect.Method.invoke(Method.java:498)
radar-hdfs-connector_1 | 2017-08-09T08:05:39.180982270Z at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
radar-hdfs-connector_1 | 2017-08-09T08:05:39.180991015Z at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
radar-hdfs-connector_1 | 2017-08-09T08:05:39.180999277Z at com.sun.proxy.$Proxy48.append(Unknown Source)
radar-hdfs-connector_1 | 2017-08-09T08:05:39.181018375Z at org.apache.hadoop.hdfs.DFSClient.callAppend(DFSClient.java:1756)
radar-hdfs-connector_1 | 2017-08-09T08:05:39.181026889Z at org.apache.hadoop.hdfs.DFSClient.append(DFSClient.java:1792)
radar-hdfs-connector_1 | 2017-08-09T08:05:39.181034905Z at org.apache.hadoop.hdfs.DFSClient.append(DFSClient.java:1785)
radar-hdfs-connector_1 | 2017-08-09T08:05:39.181042849Z at org.apache.hadoop.hdfs.DistributedFileSystem$4.doCall(DistributedFileSystem.java:323)
radar-hdfs-connector_1 | 2017-08-09T08:05:39.181051198Z at org.apache.hadoop.hdfs.DistributedFileSystem$4.doCall(DistributedFileSystem.java:319)
radar-hdfs-connector_1 | 2017-08-09T08:05:39.181059184Z at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
radar-hdfs-connector_1 | 2017-08-09T08:05:39.181067251Z at org.apache.hadoop.hdfs.DistributedFileSystem.append(DistributedFileSystem.java:319)
radar-hdfs-connector_1 | 2017-08-09T08:05:39.181097242Z at org.apache.hadoop.fs.FileSystem.append(FileSystem.java:1173)
radar-hdfs-connector_1 | 2017-08-09T08:05:39.181107244Z at io.confluent.connect.hdfs.wal.WALFile$Writer.(WALFile.java:221)
radar-hdfs-connector_1 | 2017-08-09T08:05:39.181115797Z at io.confluent.connect.hdfs.wal.WALFile.createWriter(WALFile.java:67)
radar-hdfs-connector_1 | 2017-08-09T08:05:39.181124107Z at io.confluent.connect.hdfs.wal.FSWAL.acquireLease(FSWAL.java:73)
radar-hdfs-connector_1 | 2017-08-09T08:05:39.181132291Z ... 26 more
It looks like this is the same thing that gave errors before: empty files. See confluentinc/kafka-connect-hdfs#53. To restart, they removed the empty files giving errors.
ah yes I remember...
I didn't do anything except hadoop fs -ls /logs/*/*
, showing that no log files at all for the releavnt topics existed (also not empty ones), and now it is committing again... maybe the container restart helped anyway
So, being optimistic and assuming that the missing data is now being completely recovered, what would be possibilities to make sure that the connector will not lag behind like this? It is really quite slow at the moment, processing maybe 1-2 HDFS files per second, while not being load intensive at all.
root@bd4fef958418:/# ps aux
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
root 1 4.7 27.8 14484272 6877400 ? Ssl 08:02 16:02 java -Xms256m -Xmx8g -server -XX:+UseG1GC -XX:MaxGCPauseMillis=20 -XX:InitiatingHeapOccupancyPercent=35 -XX:+DisableExplicitGC -Djava.awt.headless=true -Dcom.sun.management.jmxremote=true -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -Dkafka.logs.dir=/var/log/kafka -Dlog4j.configuration=file:/etc/kafka/connect-log4j.properties -cp /etc/kafka-connect/jars/*:/usr/share/java/confluent-common/*:/usr/share/java/kafka-serde-tools/*:/usr/share/java/monitoring-interceptors/*:/usr/share/java/kafka-connect-elasticsearch/*:/usr/share/java/kafka-connect-hdfs/*:/usr/share/java/kafka-connect-jdbc/*:/usr/bin/../share/java/kafka/*:/usr/bin/../share/java/confluent-support-metrics/*:/usr/share/java/confluent-support-metrics/* org.apache.kafka.connect.cli.ConnectStandalone /etc/kafka-connect/kafka-connect.properties /etc/kafka-connect/sink-hdfs.properties
root 19060 0.0 0.0 21964 3604 pts/0 Ss 13:39 0:00 /bin/bash
root 19357 0.0 0.0 19184 2308 pts/0 R+ 13:41 0:00 ps aux
And being more than one day behind after only two days of recording from 3+ sources will not be feasible in the long run IMO. Might the bottleneck maybe be somewhere else? Although I can't see other containers that have heavy load in cadvisor.
Solved by increasing the Kafka HDFS connector flush size.
I am having a weird case of data loss at the moment concerning the extraction tool. Using one patient as an example, here is what I am seeing: Using the tool, I get csv files for topic
android_empatica_e4_acceleration
(showing just the last 3):The problem is that there is missing data from the following night. The recording lasts until the morning after, which is also reflected in other empatica topics, like e.g.
android_empatica_e4_electrodermal_activity
:This is currently the case for the acc and BVP topics.
I know that the here missing data at least went through kafka, since the aggregated data for that timeframe is in the hotstorage:
showing data from the REST-API call get_samples_json, for that particular recording. (Note the date/time x-axis is CEST, not UTC)
Something I noticed in my logfiles from the tool: I have several
BlockMissingException
s from those two topicsAlthough I have no idea if that could be the source of the problem. Maybe a particular partition is not extracting, which contains the data?
I am using @fnobilia's script from RADAR-Docker, and have DEBUG enabled in log4j.properties.