Closed jcburley closed 4 years ago
Do I understand this is a fork that automatically exposes all the Go standard library to be used from within Joker?
The start time hits seems a bummer. Any way to not do this at startup? But delay some of it at runtime only when needed?
Do I understand this is a fork that automatically exposes all the Go standard library to be used from within Joker?
That's the idea, but it's not yet fully realized. Roughly half (or more) of the functions and receivers are wrapped; as to whether enough functionality is exposed to do useful things, that depends on the specific useful things and whether they're sufficiently exposed!
I hope to make substantial progress within a couple of weeks after returning from vacation (later this week).
The start time hits seems a bummer. Any way to not do this at startup? But delay some of it at runtime only when needed?
The current focus is to get things working and get input from stakeholders, whomever they are.
But, for my purposes, I not only need specific Go packages fully exposed, I also really want fast startup times.
So I expect I'll turn my attention to startup time within a few weeks, and am hopeful that this "hit" can be significantly reduced or possibly even eliminated (maybe by improving Joker's startup time in general, not just on my own fork).
The start time hits seems a bummer. Any way to not do this at startup? But delay some of it at runtime only when needed?
[...]I expect I'll turn my attention to startup time within a few weeks, and am hopeful that this "hit" can be significantly reduced or possibly even eliminated (maybe by improving Joker's startup time in general, not just on my own fork).
Couldn't stop thinking about different approaches to this, so did some experimentation last night and discovered Go doesn't (yet) offer some of the capabilities that I was assuming it would and that would enable certain forms of optimization of startup time. E.g. build-time initialization of constant arrays and maps is not supported! (Even C has build-time initialization of arrays, and one can presumably simulate build-time initialization of maps, which aren't a C language feature, by "compiling" their initial state down to build-time-initialized byte arrays.)
Go does support build-time initialization of strings, so it's possible that can be used as an effective (and performant) substitute for byte arrays. I might investigate that later, but I tend to think it's build-time maps that'd be the lower-hanging fruit (if Go supported them).
So I decided to try out your idea of delaying initialization at runtime, and it seems to be quite effective reducing a substantial portion of the increased startup time seen in the gostd
fork of Joker, though I can't seem to measure any substantial improvement using that technique in the official version:
https://github.com/candid82/joker/pull/258
That makes sense, though, because there are comparatively few namespaces and "interned" symbols (other than in joker.core
, which of course always gets loaded; and in the other core/
namespaces that are not affected by the above path) in official Joker.
Whereas, gostd
Joker has many namespaces and interned symbols -- most of which, with the above patch, are not initialized at all during a typical invocation.
Further improvements are possible by bringing more initialization activity into the lazy-initialization phase. Will be interesting to see how much closer this approach can bring gostd
Joker in line with official Joker, in terms of startup costs.
(Edited to note that the above patch/PR has already been folded into the gostd
fork.)
Thanks for the suggestion!!
Further improvements are possible by bringing more initialization activity into the lazy-initialization phase. Will be interesting to see how much closer this approach can bring
gostd
Joker in line with official Joker, in terms of startup costs.
I've just done this work, so as far as I know, none of the Joker-specific code does any namespace-specific initialization other than to register a namespace (which is a non-trivial operation, but shouldn't be very expensive).
My ad-hoc measuring shows anywhere from a 10% to 20% hit at this point. That's better than the 25% (or more) I was seeing before starting this optimization work.
I suspect (but do not know for sure) that the underlying (Go) packages are themselves doing some expensive initialization. And of course they wouldn't support the concept of "lazy" loading in the same way as their Joker wrappers do.
So unless some new ideas come up, I think we're at the end of the road for improving the gostd
fork in terms of performance, versus official Joker.
But there might be some opportunities for improving both versions of Joker, which I might investigate sometime soon.
If Go was to add support for build-time initialization of (constant) arrays, maps, and such, it's likely this would speed up Joker noticeably, if not substantially. But it might depend on whether a new version of Go automatically optimized such initialization, or required different syntax for it. Also the extent of improvements might depend on coding styles used in the Go library; code that is currently in func init()
bodies, but could be in top-level var
(or const
) declarations, might have to be so converted to take advantage of improvements in build-time initialization/optimization of Go programs.
(At some point I plan to make a PR for official Joker that finishes the job of deferring initialization to lazy-load time, to complement that work done for the gostd
branch. I doubt much, if any, measurable performance improvement will result; but it'd be nice to be consistent.)
@jcburley hey James, I am starting to look at your fork. Sorry it's taking so long, I have very limited time for Joker these days. I noticed that you are actively working on gotype
branch that introduced GoType
. Is this the branch I should be looking at and if so, could you please provide a brief walk-through of GoType
and its relationship with GoObject
?
Yes, that branch has a bunch of changes that should soon be merged into the gostd
fork itself.
Have you read the "Types" section at https://github.com/jcburley/joker/blob/gostd/GOSTD.md yet? That should be a good starting-point; though, as you'll see, it's rather tentative. (E.g. GoType
is no longer "abstract", but a concrete type. EDIT: I just fixed that.)
I prefer to not introduce new Object
types, of course; but I've run into various situations where it hasn't been sufficiently obvious what else I should do.
One of the fundamental requirements, as far as I can tell, is for (wrapped) Go types to reside in namespaces (that themselves wrap corresponding Go packages).
When I tried extending Type
to do this, I ran into all sorts of trouble, and decided to try introducing a distinct object. It seems to be conceptually distinct from built-in types (which do not adhere to any namespaces) and Objects (which have values).
So far, that seems to be working well.
Thanks, that helps! The progress you've made is impressive! I'll definitely keep an eye on this fork as it evolves. I am still not sure if it's ever going to be a good fit for "canonical" Joker, but at some point it may become too useful and bring too much value to warrant the merge (not sure if this is your goal though). Some concerns I have include a couple of minor technical points and one higher level design question:
ns-sources
feature. Maybe that's OK, but I'd certainly like most of the "standard library" to be built into Joker itself. BTW, this starts to look a bit like the question "How do we enable third party libraries that have access to native Go?". Gostd is certainly an answer to that, but one other direction I'd like to explore some day is using Go's plugins.
On the other hand, maybe having access to vast Go standard library is valuable enough even without higher lever wrappers. I would recommend picking a few use cases and see what it would be like to implement them with gostd. One thing I wanted to do the other day was to send an HTTP request with gzipped body. I ended up doing that with curl
as Joker doesn't currently have the API for gzip compression, but it'd be interesting to see it done with gostd. Maybe you already use gostd in your scripts, in which case it'd be cool to see some examples.
Thanks!Thanks, that helps! The progress you've made is impressive! I'll definitely keep an eye on this fork as it evolves. I am still not sure if it's ever going to be a good fit for "canonical" Joker, but at some point it may become too useful and bring too much value to warrant the merge (not sure if this is your goal though).
I'm not sure whether you really meant "too useful...to warrant the merge" in the sense of having it be a distinct fork/product, or meant "...to not warrant the merge"?
Meanwhile I'm quite on the fence about this myself. It's getting close to being super-useful for the work (mostly research/prototyping of a new architectural model) I've been planning, but I don't think I'd need canonical Joker (versus my own fork) to incorporate this, given the experimental nature of that planned work.
Some concerns I have include a couple of minor technical points and one higher level design question:
- Gostd more than doubles the size of the executable. This is probably no big deal though, as even ~40M is not terrible by modern standards.
My sense is that, as long as (this fork of?) Joker offers at least 1 OoM lower size versus corresponding Clojure/ClojureScript/etc variants, and >= 1 OoM startup performance, it's bearable.
I originally went down the path (of spurning Clojure+JVM for my plans) because the resulting process took up so much memory that my server was having trouble doing "normal" tasks -- and that CLJ process was a simple demo!
So I'll be quite open to ways to cut down on the size, either via changes to how gostd
emits code, via improvements to Joker's memory utilization, or both.
- It makes the startup slightly slower, although I was pleasantly surprised by how small the increase is. On my 2019 MBP it's 38ms for Joker vs 41ms for Joker-gostd. I suspect the difference will be bigger on slower machines, but probably tolerable (and perhaps even negligible).
I'm glad to see that. The lazy-loading of namespaces was an inspired idea (and not mine, IIRC), plus much easier than the optimizations I had in mind (some of which might still work out, but Go currently doesn't support some of them).
Exploring ways to further reduce startup time remains on my list of stuff to do.
- Gostd brings a lot of complexity, and it remains to be seen if the value it provides is worth that complexity.
Yes. Besides gostd
itself being poorly architected/designed (it's grown too organically, being a prototype; though I do refactoring now and then, including last night, which should get push to my fork in the next few days), it's yet another component that would need to be maintained for people and orgs to feel as though they could rely on it. Right now I have enough trouble "maintaining" it myself, that I'd want to refactor it substantially before inflicting it on canonical-Joker devs (like yourself).
As you point out in the docs, the API it exposes is not high level or idiomatic Clojure API that people normally would want to use directly. Instead, it's supposed to be wrapped by higher level API, which can be done without modifying Joker's source code. While this is true, I am not sure how this would work in practice. Where would this higher level API live? One of the value propositions of Joker is that it's a single binary with no dependencies and "batteries included". If that higher level API is built into Joker's executable, it defeats the purpose of being able to write that API without modifying Joker's source code. (At that point it's easier to just wrap native Go code directly, as done for existing standard namespaces.) If it's not built into Joker's executable, than it would have to be an external dependency, perhaps leveraging
ns-sources
feature. Maybe that's OK, but I'd certainly like most of the "standard library" to be built into Joker itself. BTW, this starts to look a bit like the question "How do we enable third party libraries that have access to native Go?". Gostd is certainly an answer to that, but one other direction I'd like to explore some day is using Go's plugins.
The Classpath support (already in canonical Joker) should provide one reasonable approach to supporting this.
I'm also amenable to automatically detecting certain Go patterns and converting them to Clojure idioms instead of (or in addition to) the low-level stuff. I'm not sure how far we can go with this, but it's worth considering.
On the other hand, maybe having access to vast Go standard library is valuable enough even without higher lever wrappers. I would recommend picking a few use cases and see what it would be like to implement them with gostd. One thing I wanted to do the other day was to send an HTTP request with gzipped body. I ended up doing that with
curl
as Joker doesn't currently have the API for gzip compression, but it'd be interesting to see it done with gostd. Maybe you already use gostd in your scripts, in which case it'd be cool to see some examples.
I haven't started using gostd
in my scripts, but one feature I've partially implemented allows specifying additional Go packages (along with their namespace prefixes, go.std.
being the one for the Go src/
tree).
Supporting sites that want to build their own custom Joker executables, wrapping arbitrary Go libraries (including their own), could be quite a big win.
Based on past experience, it's hard to predict whether this'll "take" as something useful for enough people, and especially what creative use cases (if any) will come out of it.
Certainly, once I have some decent gostd
-using Joker code up and running, I'll be pushing it to a suitable repo. For me, one missing element was wrapping net/smtp
(which obviously would have taken vastly less time to just wrap by hand!); but I have larger plans than that, and generally I just like working on code-conversion/transformation tools.
If it seems best to make this a distinct version of Joker, there's surely plenty of common code that could be refactored into distinct Go packages (as libraries), including parsing and such; and it might be useful to give it a distinct name, one that brings out the close(r) connection between Clojure and Go that it would represent. (E.g. "Gojure".)
But that's aways down the road.
I really appreciate your input, and look forward to pushing out new stuff for your continued review soon!
I'm not sure whether you really meant "too useful...to warrant the merge" in the sense of having it be a distinct fork/product, or meant "...to not warrant the merge"?
Haha, sometime I confuse myself with my English. I meant the latter, but I do agree it probably makes more sense to keep gostd a distinct version of Joker for the foreseeable future as the design goal, use cases and potential user base may be quite different for gostd.
I've just now pushed some pretty big changes in terms of effort and code, but mostly they're about generating documentation for receivers and related refactoring. E.g. see:
https://burleyarch.com/joker/docs/amd64-darwin/go.std.net.html#_types
(I'm not thrilled with the style, but don't want to fuss with that just now.)
Much work still to do, though. E.g. add (to namespaces) types that do not have any receivers associated with them, remove the "static" constructors (Foo.
) once new
works for all supported types, handle receivers with abstract types but concrete private implementations of them (returned by, for example, os.Stat()
)....
Closing as this is not actionable anymore. It's been decided to keep gostd
fork separate from "main" Joker.
Much progress made over the past two weeks, thanks to a new workstation (powered by a Ryzen 3900x, much faster than the 8-year-old i7 that got dropped by the movers back in May).
Though there's still much left to do, I'll be on vacation for much of the next couple of weeks. So I've tried to get enough stuff working to ask for feedback on this:
https://github.com/jcburley/joker
See snapshots of some per-target namespace docs at: https://burleyarch.com/joker/docs/
The "big things" accomplished the past two weeks include autowrapping:
Some receivers (no docstrings are generated for these)
Most constants
All variables
The performance hit seems to be roughly 25% more startup time on my machines. I hope to find ways to improve startup performance here, or in Joker generally, later this year.
In the meantime, I welcome feedback as to how usable this is, what are the most urgent "asks", and so on. (I'll be coming up with my own answers as I put this version into wider use in my own (production environment.)