Closed GoogleCodeExporter closed 9 years ago
First question:
What driver exactly?
Ideas on reproducing
********************
12:03:53 < cinap_lenrek> was there high traffic when it happens? or what the
card idle for a long time?
12:05:31 < EthanG_> not idle either time
12:06:01 < EthanG_> high-ish traffic the second time; there was a vnc
connection in use shortly before it happened
12:06:40 < EthanG_> small vnc screen (600x600) playing a game that redrew its
whole window every time the mouse went down or up
12:07:03 < cinap_lenrek> try to stress it
12:07:21 < EthanG_> aye
12:07:42 < cinap_lenrek> you can also read /mnt/term/dev/zero over cpu
connection or something like that
Driver modding
**************
11:57:50 < cinap_lenrek> igbeinterrupt() doesnt print anything
11:58:40 < EthanG_> should i put somethig in there or would it go off with
every packet?
11:59:35 < cinap_lenrek> EthanG_: first, get the spec, then check what the bits
in the interrupt status register mean
12:02:06 < cinap_lenrek> also, find/google/bribe/steal the hardware spec
when it happens
***************
11:54:16 < cinap_lenrek> run snoopy
11:54:26 < cinap_lenrek> and wireshark or whatever on another machine
11:54:32 < cinap_lenrek> then ping arround
11:55:05 < cinap_lenrek> this way, you can figure out if it still works in some
direction
11:55:24 < cinap_lenrek> maybe it just fails to receive packets, but is still
able to send them
11:55:37 < cinap_lenrek> sometimes it can receive, but sending packets is fucked
11:56:23 < cinap_lenrek> check for any messages on the console
11:56:33 < EthanG_> aye
11:56:37 < cinap_lenrek> maybe it did a print when it hit some error condition
11:56:59 < EthanG_> I don't think it did last time.
12:00:19 < cinap_lenrek> EthanG_: even wihout modifying the code, you can cat
the status files of the ethernet device and check if interrupt counters still
increase when sending/receiving packets
12:00:50 < cinap_lenrek> EthanG_: and do that basic snoopy/tcpdump check
12:01:10 < cinap_lenrek> that should get us some better symptoms than "it stops
working randomly"
12:01:24 < EthanG_> yeah
Possible fixes
**************
12:02:46 < cinap_lenrek> if we're unable to fix it, we might just reset the
card if it happens
12:02:51 < cinap_lenrek> that often gets stuff working again
Stray thoughts
**************
12:08:22 < cinap_lenrek> maybe its not even the network card
12:08:26 < EthanG_> aye, aye
12:08:31 < cinap_lenrek> but some other shit is locked up in the ipstack
12:08:50 < EthanG_> yeah could be
12:09:20 < cinap_lenrek> maybe you can add a 2nd network card?
12:09:39 < EthanG_> It would have to be usb
12:10:05 < cinap_lenrek> fun :)
12:10:14 < EthanG_> no thanks :)
Original comment by tereniao...@gmail.com
on 30 Dec 2011 at 1:43
Happened again. Always happens when I want to relax.
Telnet recieved a line soon after the failure, but not 7-8 minutes later. It
was failing to send before the one recieved line came through.
From /net/ether0/ifstats, good packets recieved increased by 9 between the
first 2 cats after failure, as did broadcast packets recieved. 1 or 2 telnet
packets were expected in this same timeframe, I guess they didn't arrive.
Possibly related to the ethernet lead slipping out of its socket. It always
reconnects whent he lead is pushed back in, but this time and the last this
problem has occured shortly after reconnecting.
Original comment by tereniao...@gmail.com
on 31 Dec 2011 at 10:03
Note, completely disconnecting and reconnecting the ethernet lead does not fix
this.
More next time it happens.
I was wrong about it happening after a certain amount of data or time.
Original comment by tereniao...@gmail.com
on 31 Dec 2011 at 10:06
Closing this. It's either been fixed en passant or was hardware trouble which
I'm no longer triggering (loose socket).
Original comment by tereniao...@gmail.com
on 17 Oct 2012 at 10:03
Original issue reported on code.google.com by
tereniao...@gmail.com
on 30 Dec 2011 at 12:00