Open trws opened 4 months ago
I don't think the sched-fluxion-resource.status
RPC is used anymore, since I believe all the functionality is handled by core now.
The caches in the ctx (mr{all,down,alloc}) are never freed at all.
The member data is updated in status_request_cb
whenever the values change or a configured amount of time has elapsed, but yeah, they aren't freed when Fluxion is stopped.
Mark mentioned the same thing, and I agree it shouldn't be (at least not often) but it shows up in my recent heaptrack leak report. Will check again, try to get more detail here.
It's not used by default anymore, but we can check the scheduler's view of resources by running e.g.
FLUX_RESOURCE_LIST_RPC=sched.resource-status flux resource list
which can be handy. I'd say don't get rid of it :-)
Yup, not planning to get rid of it, do think we need to fix the leak though, and rather wondering what's poking it.
The json_t * variables R_all, R_down and R_alloc in status_request_cb are only freed by
json_decref
on the error path. The caches in the ctx (mr{all,down,alloc}) are never freed at all. Found this thanks to a 10k leak heaptrack found on one of our drained resource tests, tiny in the test, but could get larger over time if there are a lot of requests to this RPC, or a lot of resource updates, or especially both.Plan:
__attribute__((cleanup))
so we can be safer about all of thisThis is a priority bugfix, if I can get it together today I will, and we should try to get it deployed ASAP.