MasonProtter / Bumper.jl

Bring Your Own Stack
MIT License
152 stars 6 forks source link

Move into Julia and/or under JuliaLang, as stdlib? #16

Closed PallHaraldsson closed 10 months ago

PallHaraldsson commented 10 months ago

Hi,

I believe your package has a good track record by now, just works. Probably not many know of it.

Should it be added to Julia, so that e.g. the compiler/optimizer can use it? It seems we could compete with Mojo that way. It deallocates as fast as possible, before variables go out of scope even (in languages like C++).

A first step would even be helpful on its own:

Phase 1. Just move unchanged, gives more visibility (could also be had by documenting in Julia's docs). Julia itself wouldn't use. But at any point if could, by uses Bumper.jl as documented.

Phase 2. This would be up to Julia people also, and the main win with merging. Make use of already existing idiomatic Julia code in or out of Julia use Bumper.jl transparently.

I recall our discussion, but can't find it, about dynamically adding to the buffer. I see it's now Task_local (would it be per thread, or is that in effect what it is?). I mentioned a problem with dynamically enlarging, so you backed away from it and now I found a solution, but it seems redundant, with changes I see you've now already implemented. I see you now allocate 1/8th of physical memory, which seems way excessive, which I think is the point so that you never have to enlarge. You rely on the VM (and [RAM] memory not actually used just virtual memory reserved, and the OS allocating more of it transparently). So why 1/8th? Why not even larger, all of it, or smaller? I'm guessing if you have e.g. 8 threads then you allocate all, and with 16 then 2x overcommit (which is ok, at least on Linux).

I do not believe overcommitting works on Windows however, so do you know of problems, if e.g. you have very many threads? Also say Julia's with 8 threads, and 4 such Julias running at once, is that ok? I don't know about macOS, but it's likely similar. Before merging, such use would need to be confirmed ok, or if lower from 1/8th...

https://github.com/MasonProtter/Bumper.jl/commit/95d51c7a2e643b8574deb1ae58f63c0773fb5173

MasonProtter commented 10 months ago

Hi Pall, thanks for the kind words. That's not really how stdlibs work though, I strongly doubt any of the language maintainers would be interested in having bumper in a stdlib or base, and I would agree it's inappropriate for that.

So why 1/8th? Why not even larger, all of it, or smaller? I'm guessing if you have e.g. 8 threads then you allocate all, and with 16 then 2x overcommit (which is ok, at least on Linux).

I was doing all of the physical memory, but I realized that if someone did end up doing some series of calls that ended up using that whole stack of memory, then the whole array would actually be allocated by the OS and not freed, so it'd effectively leak memory. I decided (rather arbitrarily) that one eighth was a reasonable compromise.

I do not believe overcommitting works on Windows however, so do you know of problems, if e.g. you have very many threads?

Yeah, that could end up being a problem. I haven't really done any serious testing on Windows and it's not something I think very often about. (another reason this package is not appropriate for a stdlib). I'd definitely advise Windows users in that case to prefer to use explicitly created buffers, rather than the default_buffer (which is what packages relying on Bumper.jl should be doing anyways.