Open amilsted opened 7 years ago
Hi Ashley and the Julia Community! Sorry for disturbing you after so many years of this issue! Were there any thoughts or updates about this issue?
I was thinking about a possible quarter project on Runtime Systems with Julia and this issue looks like a possible interesting challenge to solve :) Is there anyone I can approach about this topic? Or are there any other ideas within a community about RunTime systems to which I might contribute within a couple of months (quarter length)?
Thanks, Best wishes,
There are known issues with Julia's GC, but unless you already have some experience writing GC code (preferably in C), it is probably not within reach as this type of project (unless you're ready to put in a ton of time). If you're looking for projects, I would recommend asking on Slack or Discourse, both of which are more likely to get community response.
Got it, thanks @oscardssmith!
Hello! I've been using Julia quite a bit for theoretical physics simulations. In particular, I do a lot of work with tensor networks, which are networks of tensors that must be contracted to compute a desired physical result. For tensor contractions, I use TensorOperations.jl by @Jutho.
I've noticed that some of my code triggers the garbage collector a lot. There seem to be a lot of allocations, a few of which are quite large, and for some reason this makes the GC run very often. Turning off the GC temporarily or preallocating memory where possible dramatically improves performance, but the former seems hacky and the latter is ugly. The clean code, which allocates arrays for intermediate results is:
I have compared the performance of
TMMs!()
with versions that (i) pre-allocate a "work" vector which is then cut up to hold the intermediate results, (ii) allocate intermediate arrays using malloc() to avoid triggering the GC, (iii) turn off the GC for the tensor contraction then turn it back on (thanks @vtjnash!). Here are some BenchmarkTools results on Julia 0.6.0:I can't help feeling Julia could do better on the pretty code ("unmodified"). Heavy optimization is often not worth the extra debugging effort for scientific code (unless it will be used heavily), and the current results make me hesitate when recommending Julia for research on Tensor Network algorithms, despite the lovely syntax provided by TensorOperations, and all the other reasons I think Julia is awesome.
I was hoping perhaps somebody has an idea for how to modify the GC heuristics to better cope with cases like this. The number of allocations hardly differs between the approaches above (around 150 per iteration), so it really seems like the combination of many smaller allocations and some large ones makes the GC hyperactive.
I have attached a Jupyter notebook with the benchmark: julia_tensor_GC_example.zip