Closed yanko-yankulov closed 7 years ago
Did suspend & resume work with Xorg running and resuming correctly? Would certainly motivate me to try drm-next-4.7 on my t460s...
Yep,
Suspend & resume from Xorg works as well.
@yanko-yankulov Have you experienced this problem without patch ? https://github.com/FreeBSDDesktop/freebsd-base-graphics/issues/68 This one keeps me away from drm-next branch mostly.
@abishai that's whole point of the issue - he needed to add the patch in order for it to work.
Maybe there are several issues with resume, not only pipe A faults. Probably, it's time to test my laptop again.
@abishai have you tried the patch? Without it suspend/resume doesn't work for anyone on skylake.
Without the patch I got a psychadelic display with all the colours of the rainbow but no actual image (after the first resume, that is). With it everything is ok. I have not experienced any other major issue and everything seems to work just fine. I guess I just might have been lucky and there are other hardware/configuration issues that can cause trouble. I dont have different hardware at the moment to test it.
However a patch like that seems to be necessary as part of the low-level suspend/resume code is happening in the late/early callbacks. The comments in the code explain that the split is needed to solve races with the HDMI audio output device so the quick hack I did in the patch seems as a goog enough initial solution.
So give it a try - it really 'just worked' for me.
Just one more thing - I havent checked the dmesg without the patch but the description of #68 seems to be exactly what I observed. (Althoug I believe my lines have been horisontal. Didn't really pay much attention) My bad that I havent noticed it before opening this issue.
FYI, I've merged this in to drm-next and I can now resume on my Skylake (Gen4 X1). However, I'm getting a blank screen and shortly after resume the hangcheck timer causes a lock inversion panic. The cores I'm getting are usually corrupted so it's hard to see what locks the thread is holding.
I've fixed the panic (the third one so far testing suspend/resume!), so I can still work remotely on the laptop after resume. However, the display still doesn't come back. I'm also getting a panic on shutdown
I think I have some time today to put my laptop in jeopardy. Should I use drm-next for testing?
You should try it yes, but I don't think suspend resume really works there yet, otherwise I don't think it has any regressions with respect to drm-next-4.7
My laptop died during llvm compilation:) That's all my long tongue about laptop in jeopardy, but power supply is really burned out. I hope Dell can replace it fast enough.
A quick update - just tried fresh drm-next. Suspend/Resume works, but the screen is blank - the same as @mattmacy said. So there seems to be something missing in the drm-next branch. I would like to poke the code a bit to see if something comes out, but not quite sure when I will get to it :).
Anecdotally: post 5 and others under Screen doesn’t turn on when waking from sleep – TrueOS Community
@grahamperrin point others here. I don't have the bandwidth to go on forums.
@mattmacy thanks for the direction; done – graphics-oriented post 11 under Resume from suspend (wake from sleep) and from hibernation.
(I hesitated because I wasn't sure about the scope of this issue.)
I am a computer novice using TrueOS ( FreeBSD 12.0 ) I have Intel i7. Suspend feature locks up computer. I do not know how to cut and paste the patch shown above.
@mattmacy A patch for drm-next. It took me like three times going through the diffs to finally see it. And I had to start inserting printfs to realize that they will not get called, but yeah its working :).
diff --git a/drivers/gpu/drm/i915/i915_pci.c b/drivers/gpu/drm/i915/i915_pci.c
index 2e30cff0c57..267501f0a31 100644
--- a/drivers/gpu/drm/i915/i915_pci.c
+++ b/drivers/gpu/drm/i915/i915_pci.c
@@ -474,6 +474,8 @@ static struct pci_driver i915_pci_driver = {
.probe = i915_pci_probe,
.remove = i915_pci_remove,
.driver.pm = &i915_pm_ops,
+ .suspend = pci_default_suspend,
+ .resume = pci_default_resume
};
static int __init i915_init(void)
After applying @yanko-yankulov's driver methods patch, suspend and resume seem to work somewhat! Suspend/resume from the console seems pretty solid with i915kms
loaded, but I do get a panic a few seconds after a suspend/resume cycle from X. One of the kasserts I saw was _page_insert_after: page already inserted
with the following backtrace:
vm_page_insert_after
remap_io_mapping
i915_gem_fault
linux_cdev_pager_populate
vm_fault_hold
vm_fault
trap_pfault
trap
calltrap
The other was vm_reserv_populate: reserv %p is already promoted
with the following (partially similar) backtrace:
vm_reserv_populate
vm_reserv_alloc_page
vm_page_alloc
vm_fault_hold
vm_fault
trap_pfault
trap
calltrap
This is also using the latest fixes in markjdb/freebsd-base-graphics... are these two things likely to be related?
Finally, is the advice about driver selection at https://github.com/FreeBSDDesktop/freebsd-base-graphics/wiki/Testing-And-Debugging-Tips still current, or should I be using the modesetting driver instead? Perhaps that would explain the difference between the console and X behaviours?
A further update: after merging 3ad2821 from markjdb/freebsd-base-graphics, I no longer have these crashes. In fact, the system is stable enough that I might enable hw.acpi.lid_switch_state
!
Agreed.
I played a bit more and I hit the same crashes. Running glxgears after the first resume causes them immediately. I have just not ran anything gl-dependant before. Applying Mark Johnston's fixes solves the issue.
Could you please explain to a computer novice ( me ) how to attempt fix. I really do not understand any terminal commands.
On Feb 3, 2017 4:23 PM, "yanko-yankulov" notifications@github.com wrote:
Agreed.
I played a bit more and I hit the same crashes. Running glxgears after the first resume causes them immediately. I have just not ran anything gl-dependant before. Applying Mark Johnston's fixes solves the issue.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/FreeBSDDesktop/freebsd-base-graphics/issues/111#issuecomment-277378962, or mute the thread https://github.com/notifications/unsubscribe-auth/AYUZ5RB2YTpo8RsDqTrdKsR9PYARJXc9ks5rY6jngaJpZM4LoX2j .
@yanko-yankulov I made a similar change that did not require a change to the driver itself. https://github.com/FreeBSDDesktop/freebsd-base-graphics/commit/e1b3fdbffece21b54ca5834a68f09a7231781323
Greate, thanks. I will give it a try.
I am currently chasing an EAGAIN followed by EFAULT in GEM_EXECBUFFER2 witch leads to "(EE) intel(0): Failed to submit rendering commands (Bad address), disabling acceleration." in Xorg log, but will test the patch (hopefully) later today.
I had that early on too. I don't recall how I fixed it though
@dlocklear01 hi, I suggest waiting a few days. We expect a fix to be made readily available to testers of the UNSTABLE repository for TrueOS Desktop.
@mattmacy The patch works perfectly. Thanks.
Regarding the crashes that @trombonehero reported, the commit that solves them seems to be https://github.com/markjdb/freebsd-base-graphics/commit/b5d2ebc23a45b6ebbcacf7b6857e6be84e9430f3 with the addition of https://github.com/markjdb/freebsd-base-graphics/commit/3ad2821545d358d24d895b17ada19f6f79d7c64b to allow compilation.
About the EAGAINs and EFAULTs that I mentioned earlier I am almost 100% sure are related https://github.com/markjdb/freebsd-base-graphics/commit/681ae75d83eab79b489c5bc7f5c9cf4a847a2439 that I cherry-picked with the rest of the Mark Johnston's fixes. I still have to verify it, but the EAGAINs seems perfectly fine, and the faults originate in get_user_pages_remote which is the main affected function in the commit.
And I was absolutely, absolutely sure, that I retested with clean drm-next before delving into the code... (Note to self 'make buildkernel' is not quite the same as 'make kernel' ). :)
Thanks again
The aforementioned commits have been pushed to drm-next.
Hi,
@markjdb thanks for the push.
With the current drm-next (27fdac1031286aaaf152e29bf3bdd042c95595bf) I am reliably getting the mentioned EFAULTs and disabled acceleration in X.
How to reproduce - start X, launch firefox, restore previous session, click on a few tabs to reload them and voila.
With e47e330c4bc73ad3327dac454b8869d17ba710cf reverted I am not seeing the issue.
I have also added a test debug printf in get_user_pages_remote (before reverting e47e330c4bc73ad3327dac454b8869d17ba710cf ) to compare new vm_map from from the mm_struct, and the old vm_map from the task and they are different.
The path to reaching get_user_pages_remote seems straightforward.
ioctl(I915_GEM_EXECBUFFER2) with at least one of the buffers with substential size (11MB or so) and probably first time the buffer is referenced.
gem_do_execbuffer / gem_execbuffer_reserve / i915_gem_execbuffer_reserve_vma / i915_vma_pin / i915_vma_insert / i915_gem_object_get_pages / i915_gem_userptr_get_pages
the initial __get_user_pages_fast in i915_gem_userptr_get_pages fails to ref all the pages and the task gets scheduled - and from the task the call get_user_pages_remote.
I have not yet traced the root cause further. I am still trying to to wrap my mind around all that code and it would take me some time to figure it all out.
With the revert of the mentioned commit suspend/resume works perfectly so far. Thanks.
Hi,
Can you re-test with the latest drm-next branch?
--HPS
Hi,
Exact same behavior (EFAULT/acceleration off) on current head(aeeba20f7ceb3dca0e28ac62eeed14d5f9ec8eeb). And again issue is not observed with e47e330 reverted.
Which intel driver are you using?
pkg info | grep intel
--HPS
If you are using the stock one, can you try to install this one instead (no need to recompile other ports or dependencies): https://github.com/FreeBSDDesktop/freebsd-ports-graphics/tree/xserver-mesa-next/x11-drivers/xf86-video-intel
--HPS
xf86-video-intel-2.99.917.20160614 from xserv-mesa-next-udev branch
it's the same in xserver-mesa-next
dri3 is compiled but not enabled at runtime
OK, I think @markjdb needs to have a look at this.
If no body else is seeing the same issue don't worry, I will dig into it. @markjdb Do you happen to have a test case for the issue the commit fixes?
@yanko-yankulov I've reverted the commit for now - I'm currently working on fixing issues with the amdgpu driver; I'll swing back to this when I have more time.
I hit the issue with one of the tests in gem_mmap_userptr here: https://github.com/markjdb/intel-gpu-tools . The code with e47e330 reverted is definitely wrong, but it seems that I'm missing something.
Hey @markjdb, thanks for that. I looked more carefully at the patch and the Linux code and it deffinitely seems like the right thing to do. I will try to trace the issue further as it got me hooked. Will update if I find somehting.
I am blind like a bat :)
@markjdb this fixes the problem:
--- a/sys/compat/linuxkpi/common/src/linux_fs.c
+++ b/sys/compat/linuxkpi/common/src/linux_fs.c
@@ -476,7 +476,7 @@ get_user_pages_remote(struct task_struct *tsk, struct mm_struct *mm,
{
vm_map_t map;
- map = &((struct vmspace *)mm->vmspace)->vm_map;
+ map = &(*(struct vmspace **)mm->vmspace)->vm_map;
return (__get_user_pages_internal(map, start, nr_pages,
!!(gup_flags & FOLL_WRITE), pages));
}
To correspond to the "mm->vmspace = &td->td_proc->p_vmspace;" in linux_alloc_current.
I am wondering though, wouldn't it be better to save a pointer to the td_proc in the mm_struct instead of a pointer inside it? If this was the initial idea of course.
That makes two of us. :)
Actually, I would fix this other way - linux_alloc_current() should just assign mm->vmspace = td->td_proc->p_vmspace rather than taking the address. In any case, thank you for tracking this down!
@markjdb. Np.
I was wondering if the vmspace could get changed sometimes after the alloc_current - I doubt it, but saw several assignments in the vm code so just mentioning it.
You're right. There is a couple of cases:
FWIW just tried zzz
on my dell xps13 and drm-next today. works perfectly. kudos and thanks for your efforts!
There are open issues with resume not being able to correctly reload firmware, but that's separate from this.
Hi guys,
Suspend / Resume worked out for me on Skylake GPU almost out of the box (drm-next-4.7 branch). The only thing I had to do is add calls to the suspend_late & resume_early calls.
Simple patch below.
Thanks for all the work that made this possible