Closed yanko-yankulov closed 7 years ago
Oh and for anyone willing to try it out don't forget LD_PRELOAD=libGL.so for the moment.
After sleeping it over I realised that I am breaking the idr with patch2. We really can't access the pcpu cache from an interupt. Will post new version later today.
Updated patch 2
From cd0222895b32ea2d309c0211739054acc0cd727d Mon Sep 17 00:00:00 2001
From: Yanko Yankulov <yanko.yankulov@gmail.com>
Date: Tue, 7 Feb 2017 12:31:20 +0200
Subject: [PATCH 2/3] linux_idr: actually use preloaded ils
If we are calling in a idr_preload/idr_preload_end block, we are executing in critical section. So in_intertupt will always return true and we will not use the preallocated idr_layer structs. Change the condition to explicitly check that we are not in interrupt, and that we are in critical section. If we are not in a critcial section we were not called from idr_preload/idr_preload_end block, so the preloaded elements aren't ours.
---
sys/compat/linuxkpi/common/src/linux_idr.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/sys/compat/linuxkpi/common/src/linux_idr.c b/sys/compat/linuxkpi/common/src/linux_idr.c
index 032fa30e87a..9112447426d 100644
--- a/sys/compat/linuxkpi/common/src/linux_idr.c
+++ b/sys/compat/linuxkpi/common/src/linux_idr.c
@@ -327,7 +327,7 @@ idr_get(struct idr *idp)
}
if ((il = lkpi_malloc(sizeof(*il), M_IDR, M_ZERO | M_NOWAIT)) != NULL)
goto done;
- if (!in_interrupt()) {
+ if (!curthread->td_intr_nesting_level && curthread->td_critnest) {
if ((il = DPCPU_GET(idr_preload_head)) != NULL) {
DPCPU_SET(idr_preload_head, il->ary[0]);
DPCPU_SET(idr_preload_cnt, DPCPU_GET(idr_preload_cnt) - 1);
--
2.11.0
I have also restored the original ordering ( the same as in Linux ). There is a comment in the Linux code explaining why this ordering is better.
Hi,
Can you re-test with the latest drm-next branch? There has been significant changes in this area.
--HPS
Except for the warning/error:
/sys/modules/linuxkpi/../../compat/linuxkpi/common/src/linux_fs.c:495:1: error: no previous prototype for function 'linux_file_free' [-Werror,-Wmissing-prototypes]
linux_file_free(struct linux_file *filp)
Everything seems perfect for the moment. Thanks.
I will close the ticket as it seems all issues mentioned are resolved. Thanks again.
Thank you! I've just put a fix for the missing prototype in.
Thank you.
I was using WERROR="" when compiling :-)
--HPS
Hi all, @hselasky ,
I am reopening this as unfortunately the vnodes don't actually get freed. This can easily be observed for example with ./gem_ctx_thrash test from intel-gpu-tools @markjdb has ported. Running the test causes the numvnodes to constantly increase rendering the machine unusable in the end.
I believe I have the reasons figured out and I am proposing a set of patches that seem to fix this. With the patches applied the ./gem_ctx_thrash passes. Looking forward to any comments or objections :)
1) It seems we really need vrele instead of vdrop, as the vdrop will not decrease the v_usecount, and leave the vnode unusable.
2) The modification in _vdrop to handle the no mountpoint case would actually forget the vnode. The mountpoint case will move it to the per mount freelist and then batch free it later. So as we don't have have a mountpoint I have put in code to add the vnode to freelist in similar maner.
With those two patches everything is OK. However I believe it would actually be better to vgone the vnodes in question before freeing them. We can't really reuse them as they have nothing associated with them, so no point in keeping them in the freelist, and waiting for a recycle to vgone them.
With this last change the new path in the _vdrop code would not get hit, but I am not quite sure whether we want to keep the changes or not.
So the proposed patches:
From 79c13df499bd95c0382aa936fce1d847961b7eb3 Mon Sep 17 00:00:00 2001
From: Yanko Yankulov <yanko.yankulov@gmail.com>
Date: Mon, 13 Feb 2017 12:46:33 +0200
Subject: [PATCH 1/3] use vrele instead of vdrop
We need to decrement the usecount too.
---
sys/compat/linuxkpi/common/src/linux_fs.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/sys/compat/linuxkpi/common/src/linux_fs.c b/sys/compat/linuxkpi/common/src/linux_fs.c
index c1909edcde2..d08378bbb1b 100644
--- a/sys/compat/linuxkpi/common/src/linux_fs.c
+++ b/sys/compat/linuxkpi/common/src/linux_fs.c
@@ -347,7 +347,7 @@ shmem_file_setup(char *name, int size, int flags)
return (filp);
err_2:
- vdrop(vp);
+ vrele(vp);
err_1:
kfree(filp);
err_0:
@@ -504,7 +504,7 @@ linux_file_free(struct linux_file *filp)
vm_object_deallocate(vp->i_mapping);
vp->i_mapping = NULL;
}
- vdrop(vp);
+ vrele(vp);
} else {
_fdrop(filp->_file, curthread);
}
--
2.11.0
From 81571fce974831501a68c95f02d783dcb3cc368d Mon Sep 17 00:00:00 2001
From: Yanko Yankulov <yanko.yankulov@gmail.com>
Date: Mon, 13 Feb 2017 10:20:39 +0200
Subject: [PATCH 2/3] push vnodes to freelist
---
sys/kern/vfs_subr.c | 7 +++++++
1 file changed, 7 insertions(+)
diff --git a/sys/kern/vfs_subr.c b/sys/kern/vfs_subr.c
index b3df4af3658..dfa7f935919 100644
--- a/sys/kern/vfs_subr.c
+++ b/sys/kern/vfs_subr.c
@@ -2950,6 +2950,13 @@ _vdrop(struct vnode *vp, bool locked)
vnlru_return_batch_locked(mp);
mtx_unlock(&mp->mnt_listmtx);
}
+ else
+ {
+ mtx_lock(&vnode_free_list_mtx);
+ TAILQ_INSERT_TAIL(&vnode_free_list, vp, v_actfreelist);
+ freevnodes ++;
+ mtx_unlock(&vnode_free_list_mtx);
+ }
} else {
VI_UNLOCK(vp);
counter_u64_add(free_owe_inact, 1);
--
2.11.0
From 0f5f6fca92ed677835f5650033c7ad4cfcd75437 Mon Sep 17 00:00:00 2001
From: Yanko Yankulov <yanko.yankulov@gmail.com>
Date: Mon, 13 Feb 2017 13:01:03 +0200
Subject: [PATCH 3/3] add vgone
No need to trash the vnode freelist
---
sys/compat/linuxkpi/common/src/linux_fs.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/sys/compat/linuxkpi/common/src/linux_fs.c b/sys/compat/linuxkpi/common/src/linux_fs.c
index d08378bbb1b..df26485561d 100644
--- a/sys/compat/linuxkpi/common/src/linux_fs.c
+++ b/sys/compat/linuxkpi/common/src/linux_fs.c
@@ -347,6 +347,7 @@ shmem_file_setup(char *name, int size, int flags)
return (filp);
err_2:
+ vgone(vp);
vrele(vp);
err_1:
kfree(filp);
@@ -504,6 +505,7 @@ linux_file_free(struct linux_file *filp)
vm_object_deallocate(vp->i_mapping);
vp->i_mapping = NULL;
}
+ vgone(vp);
vrele(vp);
} else {
_fdrop(filp->_file, curthread);
--
2.11.0
Best, Yanko
Hi @yanko-yankulov ,
I pushed your patch with some minor modifications.
Can you "git pull --rebase" your drm-next and re-test?
Thank you!
--HPS
Hi @hselasky,
Thanks.
Seems OK. The test passes and the vnodes count are back to the pre-test values.
Closing it again :).
Hi,
I was playing around with different GL applications and came across eduke32. Running it on current drm-next leads to the system becoming unresponsive after a few seconds to a few minutes of play. The syslog was filling up with failed I915_GEM_CREATE with EAGAIN(-35) errors.
I was able to track it down to a problem in linux_idr (pretty sure about that) and possibly vnode leak (not sure at all).
So about the linux_idr, patch 1
This is all dead code at the moment, as the next patch will show. So this is why this has gone unnoticed.
Patch 2:
With those two patches I am able to play with no immediate adversary effects. They also touch/depend only on the idr code so I am pretty confident about them. This leaves the question opened as to why was lkpi_malloc continously failing. I have no clue on that.
Third proposed patch to be taken with many grains of salt :). Without the above patches the system was running out of vnodes very fast. The few crashdumps I was able to collect all had the eduke32 process waiting in getnewvnode & co called from shmem_file_setup. With the patches applied I am still seeing constantly increasing vfs.numvnodes albeit at much slower rate. This got me searching to where actually the vnodes get freed, and was unable to locate such a place. The fput function that will destroy linux_file struct just frees it as it has no actual file attached to it. Thus the third proposed patch:
It marks the linux_file struct as owning the vnode and frees it in the fput. With it the vnodes were under control even without the first two patches and I have not observed any regression so far. However I am totally out of my depths here - I know absolutely nothing about the FreeBSD vfs so don't believe me for this last patch. Also I am pretty curious as to what is actually happening with the vm_pager_allocate'd mapping associated with the file. I have not yet located where it would get freed, but there is quite a lot of code there. I also haven't noticed a substantial memory depletion, but maybe I haven't looked hard enough.
Another small point, that I have noticed but have not verified. On the failing path in shmem_file_setup we would call _vdrop to free the freshly allocated vnode. However the vnode has no mountpoint and _vdrop would actually try to use vp->v_mount and cause a NULL pointer derefernce. I may be wrong on this one too, as I haven't checked it specifically, but trying to use _vdrop in the if( filp->drop_vnode ) path of fput triggered it.
Looking forward to any comments and to a few hours of happiness replaying this GEM (pun intended) :)