Closed CySlider closed 3 years ago
Thanks for the report, will review that. Have you tried the same with inbox driver? Mean the atlantic driver from kernel?
Not sure, what you mean. I am working on it with someone else on it who seems to know his way around here:
https://forum.manjaro.org/t/all-kernel-after-5-4-crash-on-me-after-suspend-sleep/36431/20
Will continue this after work in a few hours.
Sorry, after understanding far more about this topic, I get that I am using an in kernel verison and not this module. It seems to me, your master has most likely already a fix for my issue.
This is how my kernel code looks like in the resume function:
if (deep) {
ret = aq_nic_init(nic);
if (ret)
goto err_exit;
}
if (netif_running(nic->ndev)) {
ret = aq_nic_start(nic);
if (ret)
goto err_exit;
}
VS your master code
if (aq_utils_obj_test(&nic->aq_hw->flags, AQ_HW_FLAG_STARTED)) {
ret = aq_nic_init(nic);
if (ret)
goto err_exit;
ret = aq_nic_start(nic);
if (ret)
goto err_exit;
}
My version seems to initalize stuff twice if the nic feature flags are changed after the resume code happens. Most likely this is not an issue you have to fix anymore.
Installing 2.4.7 via DKMS solved this issue for me. Sorry for bothering you with it.
I was asked to ask you, if you could upstream the newer version or a fix for the current upstream version, to solve this.
The summary of the issue is this one:
pobrn: I believe the problem is that
aq_pm_resume_restore()
-> atl_resume_common(deep=true)
-> aq_nic_init()
and
aq_ndev_open()
-> aq_nic_init()
so the device will be initialized twice after resume, which causes its internal data structures to be in an invalid state, therefore causing the NULL pointer dereference in the second call to aq_nic_init()
.
I believe the reason it works for the first time is that - as the logs indicate - netif_running()
returns true
, thus I figure the netdev core thinks that the device is “running” or in some kind of started state, and thus it will not call aq_ndev_open()
after the first resume, therefore aq_nic_init()
is called only once, everything is fine. But after the second resume, netif_running()
is seemingly false
, and I believe that indicates that the netdev core thinks the device is in some kind of “stopped” state, thus it calls aq_ndev_open()
down the line, causing the second call to aq_nic_init()
, causing the NULL pointer dereference.
Thanks for confirmation, we'll schedule this fix for the in-kernel version.
I have a strange issue with all kernels past 5.4 on Manjaro with this module. A few seconds after resuming from suspend I get this error:
I tried it with every stable kernel after 5.4 and get the same. 5.4 however is rock solid.
If I run
sudo rmmod atlantic
before I suspend, this error does not happen.I should add that soon after this the whole system freezes up, and also shutdown does never finsh