CandyShop / gerrit

Automatically exported from code.google.com/p/gerrit
Apache License 2.0
1 stars 0 forks source link

remote.name.timeout causes NullPointerException, extra TransportException: Read timed out #232

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
Reported by Shawn Pearce <sop@google.com> on Thu Jun 25 07:25:49 PDT 2009
Source: JIRA GERRIT-233
Affected Version: 2.0.15
Environment: JSch 0.1.41

JSch is crashing with an NPE during connection setup if there is a timeout
configured.  Its random, which means its a thread race condition, as sometimes
the code succeeds.  Unfortunately the root cause below is incomplete as JSch
catches RuntimeException and rethrows it, discarding the original stack
trace.  So its harder to know where the failure is.

2009-06-24 17:15:16,999::ERROR: com.google.gerrit.git.PushQueue  - Cannot
replicate to ssh://android-replication@remote:29418/platform/manifest.git
org.spearce.jgit.errors.TransportException: 
ssh://android-replication@remote:29418/platform/manifest.git:
java.lang.NullPointerException
        at org.spearce.jgit.transport.TransportGitSsh.exec
(TransportGitSsh.java:150)
        at org.spearce.jgit.transport.TransportGitSsh$SshPushConnection.<init>
(TransportGitSsh.java:348)
        at org.spearce.jgit.transport.TransportGitSsh.openPush
(TransportGitSsh.java:97)
        at org.spearce.jgit.transport.PushProcess.execute(PushProcess.java:119)
        at org.spearce.jgit.transport.Transport.push(Transport.java:734)
        at com.google.gerrit.git.PushOp.pushVia(PushOp.java:192)
        at com.google.gerrit.git.PushOp.runImpl(PushOp.java:145)
        at com.google.gerrit.git.PushOp.run(PushOp.java:94)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:
441)
        at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
        at java.util.concurrent.FutureTask.run(FutureTask.java:138)
        at java.util.concurrent.ScheduledThreadPoolExecutor
$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:98)
        at java.util.concurrent.ScheduledThreadPoolExecutor
$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:207)
        at com.google.gerrit.git.WorkQueue$Task.run(WorkQueue.java:231)
        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask
(ThreadPoolExecutor.java:886)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run
(ThreadPoolExecutor.java:908)
        at java.lang.Thread.run(Thread.java:619)
Caused by: com.jcraft.jsch.JSchException: java.lang.NullPointerException
        at com.jcraft.jsch.Channel.connect(Channel.java:206)
        at org.spearce.jgit.transport.TransportGitSsh.exec
(TransportGitSsh.java:147)
        ... 16 more

Original issue reported on code.google.com by code-rev...@gtempaccount.com on 24 Sep 2009 at 7:43

GoogleCodeExporter commented 9 years ago
Comment by Shawn Pearce <sop@google.com> on Thu Jun 25 08:20:53 PDT 2009

Patching JSch 0.1.41 and creating a custom build gave me a more detailed stack
trace:

Caused by: com.jcraft.jsch.JSchException: connection failure
    at com.jcraft.jsch.Channel.connect(Channel.java:206)
    at org.spearce.jgit.transport.TransportGitSsh.exec(TransportGitSsh.java:147)
    ... 16 more
Caused by: java.lang.NullPointerException
    at com.jcraft.jsch.ChannelExec.start(ChannelExec.java:52)
    at com.jcraft.jsch.Channel.connect(Channel.java:200)
    ... 17 more

Original comment by code-rev...@gtempaccount.com on 24 Sep 2009 at 8:34

GoogleCodeExporter commented 9 years ago
Comment by Shawn Pearce <sop@google.com> on Thu Jun 25 11:04:58 PDT 2009

This seems to be a feature of JSch.

In http://www.mail-archive.com/jsch-users@lists.sourceforge.net/msg00520.html
the author more or less says we shouldn't rely on the message text of the
JSchException during a connect failure when a timeout is used... and instead
just retry in an application loop, and then give up after some number of
attempts.

Based on the sourceforge bug tracker, there's a lot of thread race conditions
and NPEs lurking around in JSch... and my own reading of the source code has
sent shivers down my spine about how unsafe the shared data really is between
threads.  Near as I can tell, it really violates JSR-133 (http://jcp.org/en/
jsr/detail?id=133) and pays no attention to it whatsoever.  If there is more
than one processor in the system I can easily see how JSch can fall over with
random exceptions.

Perhaps JGit should move to MINA SSHD's client library.

Original comment by code-rev...@gtempaccount.com on 24 Sep 2009 at 8:34

GoogleCodeExporter commented 9 years ago
Comment by Shawn Pearce <sop@google.com> on Wed Jul 01 15:55:56 PDT 2009

In http://thread.gmane.org/gmane.comp.version-control.git/122227 I asked other
JGit developers this question, and there doesn't appear to be a consensus.
Robin's remark about moving to an unknown from a semi-known that maybe could
be fixed is however pretty wise; it might be easier to fix JSch than to
rewrite the interfaces to MINA SSHD, and implement missing features in MINA
SSHD.

Original comment by code-rev...@gtempaccount.com on 24 Sep 2009 at 8:34

GoogleCodeExporter commented 9 years ago
Update by Shawn Pearce <sop@google.com> on Thu Jul 02 10:14:47 PDT 2009

Original comment by code-rev...@gtempaccount.com on 24 Sep 2009 at 8:34

GoogleCodeExporter commented 9 years ago
Comment by Shawn Pearce <sop@google.com> on Thu Jul 02 10:16:40 PDT 2009

The timeout also seems to trigger too frequently.  E.g. setting
remote.name.timeout to 30 (seconds) can cause replication to completely fail,
because data doesn't make it up into the application layer in time.  This is
either a bug in Gerrit's TimeoutInputStream (unlikely, given it waits 30
seconds) or in JSch's own network code (much more likely, given what I have
seen of it).

2009-07-01 17:23:19,572::ERROR: com.google.gerrit.git.PushQueue  - Cannot
replicate to 
ssh://android-replication@source.android.com:29418/platform/frameworks/base.git
org.spearce.jgit.errors.TransportException: Read timed out
        at org.spearce.jgit.transport.BasePackConnection.readAdvertisedRefs
(BasePackConnection.java:148)
        at org.spearce.jgit.transport.TransportGitSsh$SshPushConnection.<init>
(TransportGitSsh.java:365)
        at org.spearce.jgit.transport.TransportGitSsh.openPush
(TransportGitSsh.java:97)
        at org.spearce.jgit.transport.PushProcess.execute(PushProcess.java:119)
        at org.spearce.jgit.transport.Transport.push(Transport.java:866)
        at com.google.gerrit.git.PushOp.pushVia(PushOp.java:192)
        at com.google.gerrit.git.PushOp.runImpl(PushOp.java:145)
        at com.google.gerrit.git.PushOp.run(PushOp.java:94)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:
441)
        at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
        at java.util.concurrent.FutureTask.run(FutureTask.java:138)
        at java.util.concurrent.ScheduledThreadPoolExecutor
$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:98)
        at java.util.concurrent.ScheduledThreadPoolExecutor
$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:207)
        at com.google.gerrit.git.WorkQueue$Task.run(WorkQueue.java:244)
        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask
(ThreadPoolExecutor.java:886)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run
(ThreadPoolExecutor.java:908)
        at java.lang.Thread.run(Thread.java:619)
Caused by: java.io.InterruptedIOException: Read timed out
        at org.spearce.jgit.util.io.TimeoutInputStream.readTimedOut
(TimeoutInputStream.java:131)
        at org.spearce.jgit.util.io.TimeoutInputStream.read
(TimeoutInputStream.java:104)
        at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
        at java.io.BufferedInputStream.read1(BufferedInputStream.java:258)
        at java.io.BufferedInputStream.read(BufferedInputStream.java:317)
        at org.spearce.jgit.util.NB.readFully(NB.java:67)
        at org.spearce.jgit.transport.PacketLineIn.readLength
(PacketLineIn.java:120)
        at org.spearce.jgit.transport.PacketLineIn.readString
(PacketLineIn.java:92)
        at org.spearce.jgit.transport.BasePackConnection.readAdvertisedRefsImpl
(BasePackConnection.java:161)
        at org.spearce.jgit.transport.BasePackConnection.readAdvertisedRefs
(BasePackConnection.java:142)
        ... 16 more

Original comment by code-rev...@gtempaccount.com on 24 Sep 2009 at 8:34

GoogleCodeExporter commented 9 years ago

Original comment by sop+code@google.com on 24 Sep 2009 at 10:17

GoogleCodeExporter commented 9 years ago

Original comment by sop@google.com on 21 Nov 2009 at 6:37