Annotald / annotald

A program for annotation in the Penn Treebank format
GNU General Public License v3.0
8 stars 3 forks source link

Terrible connectivity issues #62

Closed diertani closed 10 years ago

diertani commented 10 years ago

Annotald is starting and running as usual, but it won't stay connected for more than a few seconds at most. It's taken me about half an hour to finish the last three or four changes on the same sentence. I've torn down the tunnel and rebuilt it twice, which usually fixes the problem, but today that's not working. Nothing is obviously different from my end, and yesterday I had some problems, but well within the normal range. This is becoming unworkable.

diertani commented 10 years ago

Restarting my computer made it workable again, but I think the connectivity is still much worse than usual.

diertani commented 10 years ago

...and now it's back to being basically unworkable. I've torn down the tunnel twice, and it's just getting progressively worse. It's now at the point where it will kick me out of Annotald pretty much as soon as it's finished loading. I can't make any changes. It's also loading much more slowly than usual.

I used the "jobs" command, and it didn't show me that I had any other jobs running, but I could be using the command wrong or something.

aecay commented 10 years ago

This is caused by connection disruptions between your computer and the babel server. Are you connecting over a different network (coffee shop or other public wifi, perhaps)? Are you concurrently downloading or streaming something?

I checked on babel -- it looks all quiet. There could of course be disruptions in the University's networks that I can't see from the inside. (It is the first day of classes today, so maybe that has something to do with it.)

diertani commented 10 years ago

No, everything looks the same from my end; I'm connecting from my apartment like I usually do, and I had almost nothing else open (just Chrome, the tunnel, Skype, Excel, and Word) because my computer has been having memory issues lately. It's possible that some sort of update was happening in the background that my computer did without telling me, I guess?

Maybe tomorrow things will be calmer.

beatrice57 commented 10 years ago

could this conceivably be a loose physical connection somewhere?

On Aug 27, 2014, at 1:57 PM, diertani notifications@github.com wrote:

No, everything looks the same from my end; I'm connecting from my apartment like I usually do, and I had almost nothing else open (just Chrome, the tunnel, Skype, Excel, and Word) because my computer has been having memory issues lately. It's possible that some sort of update was happening in the background that my computer did without telling me, I guess?

Maybe tomorrow things will be calmer.

— Reply to this email directly or view it on GitHub.

aecay commented 10 years ago

Beatrice, a loose physical connection would probably have detrimental effects on many aspects of internet usage, not just the tunnel for Annotald. This doesn't seem to match Ariel's experience, so it seems unlikely to me. (But of course nothing can be definitively ruled out.)

Ariel, please post an update tomorrow with your experiences. If you are having problems two days in a row then we will have to look into alternative solutions so you can keep working with Annotald.

beatrice57 commented 10 years ago

On Aug 27, 2014, at 11:08 PM, Aaron Ecay notifications@github.com wrote:

Beatrice, a loose physical connection would probably have detrimental effects on many aspects of internet usage, not just the tunnel for Annotald. This doesn't seem to match Ariel's experience, so it seems unlikely to me. (But of course nothing can be definitively ruled out.)

good point.

Ariel, please post an update tomorrow with your experiences. If you are having problems two days in a row then we will have to look into alternative solutions so you can keep working with Annotald.

— Reply to this email directly or view it on GitHub.

diertani commented 10 years ago

Yesterday was completely normal and unproblematic, maybe even a bit better than usual. The connection failed exactly twice, hours apart. It was like Wednesday never happened.

diertani commented 10 years ago

Ten connection fails today, some of them about ten minutes apart. Aaron, should I start keeping track of all my connection fails, in case there's a pattern? Would that be helpful?

(Today it went: 7.52: First connection fail. 8.03: Second connection fail. 8.29: Third connection fail. 8.38: Fourth connection fail. 8.47: Fifth connection fail. [Twenty-minute gap as I reviewed a finished file in emacs.] 9.24: Sixth connection fail. 9.29: Seventh connection fail. 9.46: Eighth connection fail. 10.26: Ninth connection fail. 10.35: Tenth connection fail.)

aecay commented 10 years ago

Can you add the options -v -v to the ssh command? This will print a lot of debugging messages to the screen. Then, sometime when the connection fails, can you upload the output here?

I'm skeptical that there's much I can do about the underlying network problems, but this will hopefully help uncover exactly what is going on. I'll also start researching alternative connection possibilities.

diertani commented 10 years ago

Yeah, I can do that. Just to clarify, it should be:

ssh -L -v -v 1234:localhost:#####

aecay commented 10 years ago

Correct.

diertani commented 10 years ago

Thanks!

beatrice57 commented 10 years ago

who’s your carrier?

would it make sense to goggle and see if folks in your area have been having connectivity issues as well?

On Aug 29, 2014, at 2:55 PM, diertani notifications@github.com wrote:

Thanks!

— Reply to this email directly or view it on GitHub.

aecay commented 10 years ago

Argh, I think I was wrong about the command line. It should probably be ssh -v -v -L 1234:localhost:#####

diertani commented 10 years ago

That certainly does leave a trail of information, all right.

Beatrice, I don't have general connectivity issues. There are occasions when my router or modem gets fussy and requires a reboot, but that's rare and definitely independent of whatever is going on here. I've run online diagnostics and they always tell me my connection is good to very good, and when I spoke to Time Warner a few months back, they told me that whatever problems I was experiencing were not happening on my end.

diertani commented 10 years ago

First disconnection error message, starting from a bit before it died:

debug2: channel 3: rcvd adjust 99838
debug2: channel 3: rcvd adjust 130976
self.thefile is: dalton-1808.psd
127.0.0.1 - - [02/Sep/2014:07:22:31] "POST /doSave HTTP/1.1" 200 21 "http://localhost:1234/diertani" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/36.0.1985.143 Safari/537.36"
127.0.0.1 - - [02/Sep/2014:07:22:40] "POST /doLogEvent HTTP/1.1" 200 - "http://localhost:1234/diertani" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/36.0.1985.143 Safari/537.36"
debug1: Connection to port 1234 forwarding to localhost port 19621 requested.
debug2: fd 10 setting TCP_NODELAY
debug1: channel 4: new [direct-tcpip]
debug2: channel 4: open confirm rwindow 2097152 rmax 32768
debug2: channel 3: rcvd adjust 100284
debug2: channel 3: rcvd adjust 115911
self.thefile is: dalton-1808.psd
debug2: channel 3: rcvd eof
debug2: channel 3: output open -> drain
debug2: channel 3: obuf empty
debug2: channel 3: close_write
debug2: channel 3: output drain -> closed
debug2: channel 4: rcvd eof
debug2: channel 4: output open -> drain
debug2: channel 4: obuf empty
debug2: channel 4: close_write
debug2: channel 4: output drain -> closed
Segmentation fault
babel /home/migration/other/MIDENG/PPCMBE/psd/stage-2/DIERTANI> debug2: channel 4: read<=0 rfd 10 len 0
debug2: channel 4: read failed
debug2: channel 4: close_read
debug2: channel 4: input open -> drain
debug2: channel 4: ibuf empty
debug2: channel 4: send eof
debug2: channel 4: input drain -> closed
debug2: channel 4: send close
debug2: channel 3: read<=0 rfd 9 len 0
debug2: channel 3: read failed
debug2: channel 3: close_read
debug2: channel 3: input open -> drain
debug2: channel 3: ibuf empty
debug2: channel 3: send eof
debug2: channel 3: input drain -> closed
debug2: channel 3: send close
debug1: Connection to port 1234 forwarding to localhost port 19621 requested.
debug2: fd 11 setting TCP_NODELAY
debug1: channel 5: new [direct-tcpip]
debug2: channel 3: rcvd close
debug2: channel 4: rcvd close
debug2: channel 3: is dead
debug2: channel 3: garbage collecting
debug1: channel 3: free: direct-tcpip: listening port 1234 for localhost port 19621, connect from ::1 port 57272, nchannels 6
debug2: channel 4: is dead
debug2: channel 4: garbage collecting
debug1: channel 4: free: direct-tcpip: listening port 1234 for localhost port 19621, connect from ::1 port 57273, nchannels 5
channel 5: open failed: connect failed: Connection refused
debug2: channel 5: zombie
debug2: channel 5: garbage collecting
debug1: channel 5: free: direct-tcpip: listening port 1234 for localhost port 19621, connect from ::1 port 57274, nchannels 4
debug1: Connection to port 1234 forwarding to localhost port 19621 requested.
debug2: fd 9 setting TCP_NODELAY
debug1: channel 3: new [direct-tcpip]
channel 3: open failed: connect failed: Connection refused
debug2: channel 3: zombie
debug2: channel 3: garbage collecting
debug1: channel 3: free: direct-tcpip: listening port 1234 for localhost port 19621, connect from ::1 port 57277, nchannels 4
debug1: Connection to port 1234 forwarding to localhost port 19621 requested.
debug2: fd 9 setting TCP_NODELAY
debug1: channel 3: new [direct-tcpip]
channel 3: open failed: connect failed: Connection refused
debug2: channel 3: zombie
debug2: channel 3: garbage collecting
debug1: channel 3: free: direct-tcpip: listening port 1234 for localhost port 19621, connect from ::1 port 57278, nchannels 4
debug1: Connection to port 1234 forwarding to localhost port 19621 requested.
debug2: fd 9 setting TCP_NODELAY
debug1: channel 3: new [direct-tcpip]
diertani commented 10 years ago

And here is the messages leading up to the breaking of the pipeline:

127.0.0.1 - - [02/Sep/2014:10:10:02] "POST /doSave HTTP/1.1" 200 21 "http://localhost:1234/diertani" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/36.0.1985.143 Safari/537.36"
debug2: channel 3: rcvd eof
debug2: channel 3: output open -> drain
debug2: channel 3: obuf empty
debug2: channel 3: close_write
debug2: channel 3: output drain -> closed
debug2: channel 3: read<=0 rfd 9 len 0
debug2: channel 3: read failed
debug2: channel 3: close_read
debug2: channel 3: input open -> drain
debug2: channel 3: ibuf empty
debug2: channel 3: send eof
debug2: channel 3: input drain -> closed
debug2: channel 3: send close
debug1: Connection to port 1234 forwarding to localhost port 19621 requested.
debug2: fd 10 setting TCP_NODELAY
debug1: channel 4: new [direct-tcpip]
debug1: Connection to port 1234 forwarding to localhost port 19621 requested.
debug2: fd 11 setting TCP_NODELAY
debug1: channel 5: new [direct-tcpip]
debug1: Connection to port 1234 forwarding to localhost port 19621 requested.
debug2: fd 12 setting TCP_NODELAY
debug1: channel 6: new [direct-tcpip]
debug1: Connection to port 1234 forwarding to localhost port 19621 requested.
debug2: fd 13 setting TCP_NODELAY
debug1: channel 7: new [direct-tcpip]
debug1: Connection to port 1234 forwarding to localhost port 19621 requested.
debug2: fd 14 setting TCP_NODELAY
debug1: channel 8: new [direct-tcpip]
debug1: Connection to port 1234 forwarding to localhost port 19621 requested.
debug2: fd 15 setting TCP_NODELAY
debug1: channel 9: new [direct-tcpip]
Write failed: Broken pipe
diertani commented 10 years ago

And this is what happens when I close Annotald as God intended (before I close the tab, which I have to do by hand now):

Exit message received
Reformatting trees
...and rewriting indices sequentially
Please be patient, this may take some time
Done. :)
debug2: channel 4: rcvd eof
debug2: channel 4: output open -> drain
debug2: channel 4: obuf empty
debug2: channel 4: close_write
debug2: channel 4: output drain -> closed
debug2: channel 5: rcvd eof
debug2: channel 5: output open -> drain
debug2: channel 5: obuf empty
debug2: channel 5: close_write
debug2: channel 5: output drain -> closed
babel /home/migration/other/MIDENG/PPCMBE/psd/stage-2/DIERTANI> debug1: Connection to port 1234 forwarding to localhost port 15235 requested.
debug2: fd 9 setting TCP_NODELAY
debug1: channel 3: new [direct-tcpip]
debug2: channel 4: read<=0 rfd 10 len 0
debug2: channel 4: read failed
debug2: channel 4: close_read
debug2: channel 4: input open -> drain
debug2: channel 5: read<=0 rfd 11 len 0
debug2: channel 5: read failed
debug2: channel 5: close_read
debug2: channel 5: input open -> drain
debug2: channel 4: ibuf empty
debug2: channel 4: send eof
debug2: channel 4: input drain -> closed
debug2: channel 5: ibuf empty
debug2: channel 5: send eof
debug2: channel 5: input drain -> closed
debug2: channel 4: send close
debug2: channel 5: send close
debug2: channel 5: rcvd close
debug2: channel 4: rcvd close
channel 3: open failed: connect failed: Connection refused
debug2: channel 3: zombie
debug2: channel 3: garbage collecting
debug1: channel 3: free: direct-tcpip: listening port 1234 for localhost port 15235, connect from ::1 port 61318, nchannels 6
debug2: channel 4: is dead
debug2: channel 4: garbage collecting
debug1: channel 4: free: direct-tcpip: listening port 1234 for localhost port 15235, connect from ::1 port 61316, nchannels 5
debug2: channel 5: is dead
debug2: channel 5: garbage collecting
debug1: channel 5: free: direct-tcpip: listening port 1234 for localhost port 15235, connect from ::1 port 61317, nchannels 4
beatrice57 commented 10 years ago

starting today, i’ve been having a weird problem, unprecedented in my experience - which is that when i quickly click two links in safari one right after the other, i get the following error message.


Proxy Error

The proxy server received an invalid response from an upstream server. The proxy server could not handle the request GET /~beatrice/restricted/250/f14/sol-e1.1.html.

Reason: Error reading from remote server


when i click less quickly, the expected second link opens as expected. i wonder if this issue might be related to ariel’s connectivity problems.

let me know what you-all think, and if you think the two issues might be related, i’ll get in touch with manager@babel.

On Sep 2, 2014, at 7:04 AM, diertani notifications@github.com wrote:

That certainly does leave a trail of information, all right.

Beatrice, I don't have general connectivity issues. There are occasions when my router or modem gets fussy and requires a reboot, but that's rare and definitely independent of whatever is going on here. I've run online diagnostics and they always tell me my connection is good to very good, and when I spoke to Time Warner a few months back, they told me that whatever problems I was experiencing were not happening on my end.

Reply to this email directly or view it on GitHub.

aecay commented 10 years ago

OK, these logs have reminded me of something that I think I knew at one point but forgot. At least some of these errors are due to the Python process crashing. That's the "segmentation fault" about 1/3 of the way down Ariel's first posted set of logs. I don't know what's causing this, I've never experienced this error on my computer, and it's difficult to debug remotely.

One possibility is that the Python version installed there is over a year out of date (2.7.5 from May 15, 2013; most current is 2.7.8 from July 1, 2014). I will ask Tony about getting it upgraded. Another thing you can do is execute the command limit coredumpsize unlimited before running Annotald. This should result in a message like Segmentation fault (core dumped) when the Python error next occurs, and a file with a name beginning core being created in your working directory. If/when this happens, let me know the path to the file in this issue, and I'll take a look at it. (The file will contain some information that I may be able to use to pinpoint the source of the crash.)

Your second message is a different instance of an error than the first one, correct? That does not look like a python crash, but rather indeed a network error. Can you remind me of what triggers the error? If I recall, you try to save and it does not go through?

Can you also send me (via email, not on here) the output of running the command traceroute babel.ling.upenn.edu in the terminal (on your machine, not in an ssh session with babel)? Ideally one run when the connection is clear and another as soon after receiving an error as possible.

Beatrice, I don't think your network problems are related to this.

diertani commented 10 years ago

The Segmentation Fault kind of error is overwhelmingly more common than the other -- in fact, until it happened again yesterday, I couldn't have told you the last time it had. So all the problems I was having last week were apparently Python problems, and the name of this thread is a misnomer and red herring. Sorry about that!

It's really not clear to me what triggers the connectivity error when it happens. I typically notice it when I go to save and it doesn't go through (this is also how I notice when the Python error has occurred), but that's not necessarily when it happens.

I apparently don't have the traceroute command in my cygwin; is there a way to install it?

diertani commented 10 years ago

Path name for the core dump file:

/home/migration/other/MIDENG/PPCMBE/psd/stage-2/DIERTANI/core.9010

diertani commented 10 years ago

There is a brand-new untruncated core dump file now, same directory.

aecay commented 10 years ago

Can you chmod it please?

diertani commented 10 years ago

Oops. Yes; done.

aecay commented 10 years ago

OK, thanks. I have a copy of the file; you may delete it now for quota purposes.

diertani commented 10 years ago

Woot. Thanks!

aecay commented 10 years ago

I'm going to close this issue up, now that a possible fix is in place. Please open another one if you get more segmentation faults or connection errors.