donatengit / AppleIGB

49 stars 4 forks source link

Internet connexion lost after a few minutes #2

Open Leborgne23 opened 2 years ago

Leborgne23 commented 2 years ago

Hi, just to thank you for your work and let you know what words and what does not in my case :

donatengit commented 2 years ago

Hi @Cryptiiiic Thanks for testing.

There were no mentions of igb in dmesg log or any errors afaik

It should be even on release version. At least when you are doing ifconfig en1 down/up (dmesg should be checked right afterwards), e.g. with command sudo ifconfig en1 down; sudo ifconfig en1 up; sleep 5; sudo dmesg | grep -i igb I have long output looking like this:

[191761.667427]: igb: AppleIGB::stopTxQueue()
[191761.667438]: igb: setCarrier(0) ===>
.... 
[191768.372439]: igb: hw->fc.current_mode = 3
[191768.373006]: igb: Flow Control = FULL.
[191768.373010]: igb: 1000 Mbs, igb: Full Duplex
[191768.373015]: igb: hw->fc.current_mode = 3
[191768.373017]: igb: checkLinkStatus() ===> link=1, carrier=0, linkUp=0
[191768.373020]: igb: setLinkUp() ===>
[191768.373303]: igb: OK Link register status: 0x0000796d
[191768.374294]: igb: 1000 Mbs, igb: Full Duplex
[191768.374299]: igb: setCarrier(1) ===>
[191768.374336]: igb: setCarrier() <===
[191768.392966]: igb: output: Dropping packet on disabled device
[191768.392976]: igb: output: Dropping packet on disabled device

Could you please ensure timing is correct for checking dmesg ? I think it would be really beneficial if you check that also right after what seems to be disconnect.

Found another log [15063.048857]: uipc_accept: peer disconnected unp_gencnt 30140Sandbox apply: mdworker_shared[3649] Sandbox apply: mdworker_shared[3648] Sandbox apply: mdworker_shared[3647] compat_ifmu_ulist: en0 copyin() error 14c

Not sure this is related, happening to people on real Macs.

Separately, please ensure you are using latest version, it would make sense to try different modes for speed, duplex, flow control and EEE ((Settings -> Network -> Ethernet -> Advanced -> Hardware tab).

donatengit commented 2 years ago

Hi @llyonard

Thanks a lot for your efforts and dedication.

Anyway im always open to help if you need more test for this project

I'm afraid until I have X570 to test or there is someone with X570 with minimal development skills there is not much you could help with.

Cryptiiiic commented 2 years ago

@donatengit yes I do see compat_ifmu_ulist: en0 copyin() error 14compat_ifmu_ulist: en1 copyin() error 14 during the drop And here is the igb log, no log during drop only interface up/down

[14698.156655]: igb: AppleIGB::stopTxQueue()
[14698.156742]: igb: setCarrier(0) ===>
[14698.156770]: igb: setCarrier() <===
[14698.156772]: igb: AppleIGB::stopTxQueue()
[14698.167229]: igb: Masking off all interrupts
[14698.182929]: igb: NVM word 0x03 is not mapped.
[14698.182966]: igb: Read INVM Word 0x0a = 402f
[14698.183310]: igb: Requested word 0x04 not found in OTP
[14698.183314]: igb: Initializing the IEEE VLAN
[14698.183639]: igb: Programming MAC Address into RAR[0]
[14698.183645]: igb: Clearing RAR[1-15]
[14698.183728]: igb: Zeroing the MTA
[14698.183755]: igb: Zeroing the UTA
[14698.183848]: igb: After fix-ups FlowControl is now = 3
[14698.184739]: igb: Reconfiguring auto-neg advertisement params
[14698.185040]: igb: autoneg_advertised 20
[14698.185044]: igb: Advertise 1000mb Full duplex
[14698.185194]: igb: Auto-Neg Advertising c01
[14698.185342]: igb: Restarting Auto-Neg
[14698.185928]: igb: No link register status 0x00007949 (try 1/10)
[14698.186237]: igb: No link register status 0x00007949 (try 2/10)
[14698.186546]: igb: No link register status 0x00007949 (try 3/10)
[14698.186855]: igb: No link register status 0x00007949 (try 4/10)
[14698.187164]: igb: No link register status 0x00007949 (try 5/10)
[14698.187472]: igb: No link register status 0x00007949 (try 6/10)
[14698.187781]: igb: No link register status 0x00007949 (try 7/10)
[14698.188089]: igb: No link register status 0x00007949 (try 8/10)
[14698.188397]: igb: No link register status 0x00007949 (try 9/10)
[14698.188706]: igb: No link register status 0x00007949 (try 10/10)
[14698.188722]: igb: Unable to establish link!!!
[14698.188723]: igb: Initializing the Flow Control address, type and timer regs
[14698.190627]: igb: No link register status 0x00007949 (try 1/1)
[14698.190629]: igb: Phy info is only valid if link is up
[14698.196711]: igb: disable() <===
[14712.566065]: igb: setCarrier(0) ===>
[14712.566068]: igb: setCarrier() <===
[14712.566070]: igb: intelSetupAdvForMedium(index 7, type 5242928) ===>
[14712.566072]: igb: intelSetupAdvForMedium() <===
[14712.566074]: igb: igb_open() ===>
[14712.566075]: igb: setCarrier(0) ===>
[14712.566077]: igb: setCarrier() <===
[14712.680742]: igb: MNG configuration cycle has not completed.
[14712.681045]: igb: After fix-ups FlowControl is now = 3
[14712.681945]: igb: Reconfiguring auto-neg advertisement params
[14712.682243]: igb: autoneg_advertised 20
[14712.682245]: igb: Advertise 1000mb Full duplex
[14712.682395]: igb: Auto-Neg Advertising c01
[14712.682544]: igb: Restarting Auto-Neg
[14712.683139]: igb: No link register status 0x00007949 (try 1/10)
[14712.683453]: igb: No link register status 0x00007949 (try 2/10)
[14712.683766]: igb: No link register status 0x00007949 (try 3/10)
[14712.684079]: igb: No link register status 0x00007949 (try 4/10)
[14712.684392]: igb: No link register status 0x00007949 (try 5/10)
[14712.684705]: igb: No link register status 0x00007949 (try 6/10)
[14712.685018]: igb: No link register status 0x00007949 (try 7/10)
[14712.685331]: igb: No link register status 0x00007949 (try 8/10)
[14712.685644]: igb: No link register status 0x00007949 (try 9/10)
[14712.685957]: igb: No link register status 0x00007949 (try 10/10)
[14712.685973]: igb: Unable to establish link!!!
[14712.685975]: igb: Initializing the Flow Control address, type and timer regs
[14712.685978]: igb: Powered up link.
[14712.703182]: igb: igb_open() <===
[14712.713980]: igb: setCarrier(1) ===>
[14712.714020]: igb: setCarrier() <===
[14712.714021]: igb: enable() <===
[14715.641432]: igb: OK Link register status: 0x0000796d
[14715.641594]: igb: hw->fc.current_mode = 3
[14715.642188]: igb: Flow Control = FULL.
[14715.642192]: igb: 1000 Mbs, igb: Full Duplex
[14715.642196]: igb: hw->fc.current_mode = 3
[14715.642198]: igb: checkLinkStatus() ===> link=1, carrier=1, linkUp=1
[14715.642200]: igb: Force link down due to IGB_FLAG_NEED_LINK_UPDATE
[14715.642203]: igb: setLinkDown() ===>
[14715.642206]: igb: setCarrier(0) ===>
[14715.642244]: igb: setCarrier() <===
[14715.642245]: igb: AppleIGB::stopTxQueue()
[14715.652453]: igb: Masking off all interrupts
[14715.668158]: igb: NVM word 0x03 is not mapped.
[14715.668197]: igb: Read INVM Word 0x0a = 402f
[14715.668497]: igb: Requested word 0x04 not found in OTP
[14715.668501]: igb: Initializing the IEEE VLAN
[14715.668831]: igb: Programming MAC Address into RAR[0]
[14715.668838]: igb: Clearing RAR[1-15]
[14715.668918]: igb: Zeroing the MTA
[14715.668955]: igb: Zeroing the UTA
[14715.669028]: igb: After fix-ups FlowControl is now = 3
[14715.669936]: igb: Reconfiguring auto-neg advertisement params
[14715.670241]: igb: autoneg_advertised 20
[14715.670244]: igb: Advertise 1000mb Full duplex
[14715.670393]: igb: Auto-Neg Advertising c01
[14715.670541]: igb: Restarting Auto-Neg
[14715.671138]: igb: No link register status 0x00007949 (try 1/10)
[14715.671448]: igb: No link register status 0x00007949 (try 2/10)
[14715.671759]: igb: No link register status 0x00007949 (try 3/10)
[14715.672071]: igb: No link register status 0x00007949 (try 4/10)
[14715.672381]: igb: No link register status 0x00007949 (try 5/10)
[14715.672695]: igb: No link register status 0x00007949 (try 6/10)
[14715.673007]: igb: No link register status 0x00007949 (try 7/10)
[14715.673315]: igb: No link register status 0x00007949 (try 8/10)
[14715.673627]: igb: No link register status 0x00007949 (try 9/10)
[14715.673936]: igb: No link register status 0x00007949 (try 10/10)
[14715.673952]: igb: Unable to establish link!!!
[14715.673952]: igb: Initializing the Flow Control address, type and timer regs
[14715.674439]: igb: No link register status 0x00007949 (try 1/1)
[14715.674441]: igb: Phy info is only valid if link is up
[14715.678798]: igb: Link down on en0
[14715.678800]: igb: setLinkDown() <===
[14715.678801]: igb: checkLinkStatus() <===
[14719.191877]: igb: OK Link register status: 0x0000796d
[14719.192036]: igb: hw->fc.current_mode = 3
[14719.192623]: igb: Flow Control = FULL.
[14719.192626]: igb: 1000 Mbs, igb: Full Duplex
[14719.192630]: igb: hw->fc.current_mode = 3
[14719.192631]: igb: checkLinkStatus() ===> link=1, carrier=0, linkUp=0
[14719.192633]: igb: setLinkUp() ===>
[14719.192926]: igb: OK Link register status: 0x0000796d
[14719.193944]: igb: 1000 Mbs, igb: Full Duplex
[14719.193948]: igb: setCarrier(1) ===>
[14719.193977]: igb: setCarrier() <===
[14719.209571]: igb: output: Dropping packet on disabled device
[14719.209574]: igb: output: Dropping packet on disabled device
[14719.209580]: igb: [LU]: Link Up on en0 (i211 Copper), 1-Gigabit, Full-duplex, Rx/Tx flow-control
[14719.209584]: igb: [LU]: CTRL=0x581c0241
[14719.209587]: igb: [LU]: CTRL_EXT=0x101000c0
[14719.209590]: igb: [LU]: STATUS=0x00280383
[14719.209594]: igb: [LU]: RCTL=0x04448032
[14719.209597]: igb: [LU]: PSRCTL=0x00000000
[14719.209604]: igb: [LU]: FCRTL=0x80004170
[14719.209607]: igb: [LU]: FCRTH=0x00004180
[14719.209610]: igb: [LU]: RDLEN(0)=0x00004000
[14719.209613]: igb: [LU]: RDTR=0x00000000
[14719.209616]: igb: [LU]: RADV=0x00000000
[14719.209619]: igb: [LU]: RXCSUM=0x00002f00
[14719.209622]: igb: [LU]: RFCTL=0x00010000
[14719.209626]: igb: [LU]: RXDCTL(0)=0x02040808
[14719.209629]: igb: [LU]: RAL(0)=0x90fe4b24
[14719.209632]: igb: [LU]: RAH(0)=0x80048d67
[14719.209634]: igb: [LU]: MRQC=0x00370002
[14719.209637]: igb: [LU]: TARC(0)=0x00000000
[14719.209640]: igb: [LU]: TARC(1)=0x00000000
[14719.209643]: igb: [LU]: TCTL=0xa50400fa
[14719.209647]: igb: [LU]: TXDCTL(0)=0x02100108
[14719.209650]: igb: [LU]: TXDCTL(1)=0x00000000
[14719.209653]: igb: [LU]: EEE Active 0
[14719.209654]: igb: setLinkUp() <===
[14719.209655]: igb: checkLinkStatus() <===
[14719.670453]: tcp_timers: tcp_output() returned 0 with retransmission timer disabled for 59236 > 443 in state 4, reset timer to 483tcp_timers: tcp_output() returned 0 with retransmission timer disabled for 59237 > 443 in state 4, reset timer to 526Sandbox apply: netbiosd[73339] <bytes>uipc_accept: peer disconnected unp_gencnt 159208

I'm afraid until I have X570 to test or there is someone with X570 with minimal development skills there is not much you could help with.

I have x570 and I'm a bit of developer myself.

donatengit commented 2 years ago

I'm afraid until I have X570 to test or there is someone with X570 with minimal development skills there is not much you could help with.

Hey @Cryptiiiic

@donatengit yes I do see compat_ifmu_ulist: en0 copyin() error 14compat_ifmu_ulist: en1 copyin() error 14 during the drop

Does it appear during normal network operating? Having such errors appearing for native mac users, I'm still not quite sure this is related but one guy from reddit was able to narrow this down. Anything that could interfere the connection: VPN, other network devices (like iPhone connected through usb), maybe some Network/PCI/Energy saving/Secure boot settings in BIOS, some advanced features of X570 assumed to be used by a driver? What kind of network load did you have that period of time? Unlikely but to confirm that there is no overheat AMD Power Gadget could help.

....
[14698.196711]: igb: disable() <===
...
[14712.685978]: igb: Powered up link.
[14712.703182]: igb: igb_open() <===
[14712.714021]: igb: enable() <===
[14715.641432]: igb: OK Link register status: 0x0000796d
[14715.641594]: igb: hw->fc.current_mode = 3
[14715.642188]: igb: Flow Control = FULL.
...
[14715.642200]: igb: Force link down due to IGB_FLAG_NEED_LINK_UPDATE
....
[14715.669028]: igb: After fix-ups FlowControl is now = 3
[14715.678801]: igb: checkLinkStatus() <===
[14719.191877]: igb: OK Link register status: 0x0000796d
...
[14719.209571]: igb: output: Dropping packet on disabled device
[14719.209574]: igb: output: Dropping packet on disabled device
[14719.209580]: igb: [LU]: Link Up on en0 (i211 Copper), 1-Gigabit, Full-duplex, Rx/Tx flow-control

Re: log You were resetting device via Settings->Network or ifconfig en0 down/up and select NIC mode manually, correct? How stable would your connection be if you disable Flow control and try lower speeds (if your peer/ISP ok with that of course)?

Anyway, I can't check register values (e.g. igb: [LU]: CTRL=0x581c0241 ) at the moment but other than another round of reset due to something changed the IGB_FLAG_NEED_LINK_UPDATE nothing looks suspicious -- link was established successfully.

I have x570 and I'm a bit of developer myself.

Great! So main goal is to catch the period when connection is 'lost' (or if I get the situation with x570 correct packets begin to drop/stall silently) and corresponding reason, hoping this is not NIC or vendor specific. Still not clear whether it's in the core of the driver itself (tx/rx rings, interrupts, ...) or in the layer communicating with the OS, or due to some advanced X570 features assuming different driver behaviour.

I propose to start with building DEBUG version of the driver with XCode and then get familiar with the code structure: all high-level ethernet controller management is concentrated in class AppleIGB (extending IOEthernetController), lower level code spread across igb_ files and functions (majority of it is crossplatform). Then depending on your observations and hypothesis you could add additional hooks or debugs. Unfortunately the driver doesn't implement advanced remote debugging like IntelMausi and is not user-spaced based on DriverKit so no super-easy way to debug but this way I was able to fix all issues found with my i211 @ B450.

I don't think it's the network queue out of capacity as you would get special log message in DEBUG version but not 100% sure.

The driver is based on Intel's IGB 5.7.2, you could cherry-pick small relevant patches from Linux adoption and/or Intel's source code, last time I checked nothing had caught my eye.

P.s. I would be happy to assist/help you further please let me know if you'd like to move to some messenger for a quicker turnaround, e.g. Discord

donatengit commented 2 years ago

Hey @Cryptiiiic,

Any news?

Meanwhile very unlikely it would help but please try a version based on Intel's 5.11.4. If it doesn't help there is not much I could do without both the X570 hardware and free time to debug. I'll describe extended project status soon and update here.

donatengit commented 2 years ago

Guys, In case anyone on X570 has some time, please provide logs from the version with some functions tracing (N.B. it generates tones of logs).

Cryptiiiic commented 1 year ago

@donatengit same issues with 5.11.4, I then switched to the normal debug build(non 5.11) here are those logs. Let me know if I didn't get proper logs. igb_boot.txt igb_crash.txt igb_crash2.txt igb_crash3.txt igb_crash4.txt

Cryptiiiic commented 1 year ago

@donatengit I found a working driver on forums. I haven't dropped once while using it finally. Unfortunately, I can't seem to find source code. https://www.macos86.it/topic/6029-appleigb-and-intelmausi-integration/?tab=comments#comment-137062

Edit: Found source, its integrated into mausi fork. https://github.com/mbarbierato/IntelMausi/tree/Intgegration

henkiewie commented 1 year ago

Edit: Found source, its integrated into mausi fork. https://github.com/mbarbierato/IntelMausi/tree/Intgegration The answer is at the end of the post you are reffering to. (still having problems with i210). I upgraded to monterey with the new driver with nog problems. I hope they will fix it.

llyonard commented 1 year ago

Can confirm, the link was posted on the amd discord forum, i installed since and i had 0 disconnection on a x570 gigabyte aorus elite

donatengit commented 1 year ago

@Cryptiiiic @henkiewie @llyonard

Thanks a lot for your involvement and contribution, that's all amazing news! I'm a bit surprised tbh looking at the code that it was all that had been necessary (all this time) to make the whole I211 family work on MacOS (probably that's not all but don't have time to compile/check it myself) via well-tested IntelMausi codebase. Hope that @mbarbierato will be able to provide releases via Github, or to pull request, so it's being merged into IntelMausi for better community testing and support.

Going to update READMEs with this fork deprecation and links immediately, going to block bug reports and discussion in 2-3 week in case anyone has something to say.

Guys, please spend some time to create pull request to update Dortania guide with these new links.

donatengit commented 1 year ago

I've updated the README tried hard to mention every contributor, please let me know if missed anyone