br1ghtyang / asterixdb

Automatically exported from code.google.com/p/asterixdb
0 stars 0 forks source link

Build sporadically take very long time to finish #617

Open GoogleCodeExporter opened 8 years ago

GoogleCodeExporter commented 8 years ago
What steps will reproduce the problem?
I see from time to time that the build takes very long time to finish. It 
usually takes around 6 mins to build asterix in my machine, but in those rare 
cases, it take 25 mins to finish the build. It does not look like it is hanging 
in an AQL test case, but rather it is hanging while "shutting down the Mini 
HDFS Cluster". At first, it hangs for really long time (around 9 mins) while it 
is trying to shutdown the mini hdfs cluster:

Aug 21, 2013 2:08:47 PM 
edu.uci.ics.hyracks.control.common.dataset.ResultStateSweeper run
SEVERE: Result cleaner thread interrupted, but we continue running it.
Aug 21, 2013 2:08:47 PM 
edu.uci.ics.hyracks.control.common.dataset.ResultStateSweeper run
SEVERE: Result cleaner thread interrupted, but we continue running it.
Aug 21, 2013 2:08:47 PM 
edu.uci.ics.hyracks.control.common.dataset.ResultStateSweeper run
SEVERE: Result cleaner thread interrupted, but we continue running it.
Shutting down the Mini HDFS Cluster
Shutting down DataNode 1

and then an exception is thrown:
Aug 21, 2013 2:17:04 PM org.slf4j.impl.JCLLoggerAdapter warn
WARNING: 1 threads could not be stopped
Aug 21, 2013 2:17:04 PM 
org.apache.hadoop.hdfs.server.datanode.DataXceiverServer run
WARNING: DatanodeRegistration(127.0.0.1:51279, 
storageID=DS-1786775358-127.0.1.1-51279-1377119327516, infoPort=35480, 
ipcPort=58170):DataXceiveServer: java.nio.channels.AsynchronousCloseException
    at java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:205)
    at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:233)
    at sun.nio.ch.ServerSocketAdaptor.accept(ServerSocketAdaptor.java:99)
    at org.apache.hadoop.hdfs.server.datanode.DataXceiverServer.run(DataXceiverServer.java:130)
    at java.lang.Thread.run(Thread.java:722)

After that, it tries to shutdown another node and hang for another 9 mins while 
doing so:
Shutting down DataNode 0

An exception is then thrown:
Shutting down DataNode 0
Aug 21, 2013 2:25:19 PM org.slf4j.impl.JCLLoggerAdapter warn
WARNING: 1 threads could not be stopped
Aug 21, 2013 2:25:19 PM 
org.apache.hadoop.hdfs.server.datanode.DataXceiverServer run
WARNING: DatanodeRegistration(127.0.0.1:39614, 
storageID=DS-1345030528-127.0.1.1-39614-1377119327355, infoPort=49120, 
ipcPort=42933):DataXceiveServer: java.nio.channels.AsynchronousCloseException
    at java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:205)
    at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:233)
    at sun.nio.ch.ServerSocketAdaptor.accept(ServerSocketAdaptor.java:99)
    at org.apache.hadoop.hdfs.server.datanode.DataXceiverServer.run(DataXceiverServer.java:130)
    at java.lang.Thread.run(Thread.java:722)

It then hangs one more time for another period of time, and then it says:
Aug 21, 2013 2:33:34 PM org.slf4j.impl.JCLLoggerAdapter warn
WARNING: 1 threads could not be stopped
Aug 21, 2013 2:33:34 PM 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem$ReplicationMonitor run
WARNING: ReplicationMonitor thread received 
InterruptedException.java.lang.InterruptedException: sleep interrupted
Tests run: 715, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 1,491.413 sec

I attached three java stack traces, one for each of the mentioned hanging 
periods above.

I assigned this to Raman because the logs mention HDFS and I know you did 
something there recently, but please feel free to reassign to the right person.

Please use labels and text to provide additional information.
This issue appears in the master branch, but it is sporadic.

Original issue reported on code.google.com by salsuba...@gmail.com on 21 Aug 2013 at 9:42

Attachments:

GoogleCodeExporter commented 8 years ago
I am seeing this often on my machine and I am trying to investigate it.
Somebody has any idea/suggestion regards it?

Original comment by diegogio...@gmail.com on 30 Oct 2013 at 2:36

GoogleCodeExporter commented 8 years ago
it may be due to this: https://issues.apache.org/jira/browse/HDFS-4816

Original comment by diegogio...@gmail.com on 31 Oct 2013 at 12:57