candid82 / joker

Small Clojure interpreter, linter and formatter.
https://joker-lang.org/
Eclipse Public License 1.0
1.63k stars 65 forks source link

Evaluate 'gostd' Fork #253

Closed jcburley closed 4 years ago

jcburley commented 5 years ago

Much progress made over the past two weeks, thanks to a new workstation (powered by a Ryzen 3900x, much faster than the 8-year-old i7 that got dropped by the movers back in May).

Though there's still much left to do, I'll be on vacation for much of the next couple of weeks. So I've tried to get enough stuff working to ask for feedback on this:

https://github.com/jcburley/joker

See snapshots of some per-target namespace docs at: https://burleyarch.com/joker/docs/

The "big things" accomplished the past two weeks include autowrapping:

The performance hit seems to be roughly 25% more startup time on my machines. I hope to find ways to improve startup performance here, or in Joker generally, later this year.

In the meantime, I welcome feedback as to how usable this is, what are the most urgent "asks", and so on. (I'll be coming up with my own answers as I put this version into wider use in my own (production environment.)

didibus commented 5 years ago

Do I understand this is a fork that automatically exposes all the Go standard library to be used from within Joker?

The start time hits seems a bummer. Any way to not do this at startup? But delay some of it at runtime only when needed?

jcburley commented 5 years ago

Do I understand this is a fork that automatically exposes all the Go standard library to be used from within Joker?

That's the idea, but it's not yet fully realized. Roughly half (or more) of the functions and receivers are wrapped; as to whether enough functionality is exposed to do useful things, that depends on the specific useful things and whether they're sufficiently exposed!

I hope to make substantial progress within a couple of weeks after returning from vacation (later this week).

The start time hits seems a bummer. Any way to not do this at startup? But delay some of it at runtime only when needed?

The current focus is to get things working and get input from stakeholders, whomever they are.

But, for my purposes, I not only need specific Go packages fully exposed, I also really want fast startup times.

So I expect I'll turn my attention to startup time within a few weeks, and am hopeful that this "hit" can be significantly reduced or possibly even eliminated (maybe by improving Joker's startup time in general, not just on my own fork).

jcburley commented 5 years ago

The start time hits seems a bummer. Any way to not do this at startup? But delay some of it at runtime only when needed?

[...]I expect I'll turn my attention to startup time within a few weeks, and am hopeful that this "hit" can be significantly reduced or possibly even eliminated (maybe by improving Joker's startup time in general, not just on my own fork).

Couldn't stop thinking about different approaches to this, so did some experimentation last night and discovered Go doesn't (yet) offer some of the capabilities that I was assuming it would and that would enable certain forms of optimization of startup time. E.g. build-time initialization of constant arrays and maps is not supported! (Even C has build-time initialization of arrays, and one can presumably simulate build-time initialization of maps, which aren't a C language feature, by "compiling" their initial state down to build-time-initialized byte arrays.)

Go does support build-time initialization of strings, so it's possible that can be used as an effective (and performant) substitute for byte arrays. I might investigate that later, but I tend to think it's build-time maps that'd be the lower-hanging fruit (if Go supported them).

So I decided to try out your idea of delaying initialization at runtime, and it seems to be quite effective reducing a substantial portion of the increased startup time seen in the gostd fork of Joker, though I can't seem to measure any substantial improvement using that technique in the official version:

https://github.com/candid82/joker/pull/258

That makes sense, though, because there are comparatively few namespaces and "interned" symbols (other than in joker.core, which of course always gets loaded; and in the other core/ namespaces that are not affected by the above path) in official Joker.

Whereas, gostd Joker has many namespaces and interned symbols -- most of which, with the above patch, are not initialized at all during a typical invocation.

Further improvements are possible by bringing more initialization activity into the lazy-initialization phase. Will be interesting to see how much closer this approach can bring gostd Joker in line with official Joker, in terms of startup costs.

(Edited to note that the above patch/PR has already been folded into the gostd fork.)

Thanks for the suggestion!!

jcburley commented 5 years ago

Further improvements are possible by bringing more initialization activity into the lazy-initialization phase. Will be interesting to see how much closer this approach can bring gostd Joker in line with official Joker, in terms of startup costs.

I've just done this work, so as far as I know, none of the Joker-specific code does any namespace-specific initialization other than to register a namespace (which is a non-trivial operation, but shouldn't be very expensive).

My ad-hoc measuring shows anywhere from a 10% to 20% hit at this point. That's better than the 25% (or more) I was seeing before starting this optimization work.

I suspect (but do not know for sure) that the underlying (Go) packages are themselves doing some expensive initialization. And of course they wouldn't support the concept of "lazy" loading in the same way as their Joker wrappers do.

So unless some new ideas come up, I think we're at the end of the road for improving the gostd fork in terms of performance, versus official Joker.

But there might be some opportunities for improving both versions of Joker, which I might investigate sometime soon.

If Go was to add support for build-time initialization of (constant) arrays, maps, and such, it's likely this would speed up Joker noticeably, if not substantially. But it might depend on whether a new version of Go automatically optimized such initialization, or required different syntax for it. Also the extent of improvements might depend on coding styles used in the Go library; code that is currently in func init() bodies, but could be in top-level var (or const) declarations, might have to be so converted to take advantage of improvements in build-time initialization/optimization of Go programs.

jcburley commented 5 years ago

(At some point I plan to make a PR for official Joker that finishes the job of deferring initialization to lazy-load time, to complement that work done for the gostd branch. I doubt much, if any, measurable performance improvement will result; but it'd be nice to be consistent.)

candid82 commented 4 years ago

@jcburley hey James, I am starting to look at your fork. Sorry it's taking so long, I have very limited time for Joker these days. I noticed that you are actively working on gotype branch that introduced GoType. Is this the branch I should be looking at and if so, could you please provide a brief walk-through of GoType and its relationship with GoObject?

jcburley commented 4 years ago

Yes, that branch has a bunch of changes that should soon be merged into the gostd fork itself.

Have you read the "Types" section at https://github.com/jcburley/joker/blob/gostd/GOSTD.md yet? That should be a good starting-point; though, as you'll see, it's rather tentative. (E.g. GoType is no longer "abstract", but a concrete type. EDIT: I just fixed that.)

I prefer to not introduce new Object types, of course; but I've run into various situations where it hasn't been sufficiently obvious what else I should do.

One of the fundamental requirements, as far as I can tell, is for (wrapped) Go types to reside in namespaces (that themselves wrap corresponding Go packages).

When I tried extending Type to do this, I ran into all sorts of trouble, and decided to try introducing a distinct object. It seems to be conceptually distinct from built-in types (which do not adhere to any namespaces) and Objects (which have values).

So far, that seems to be working well.

candid82 commented 4 years ago

Thanks, that helps! The progress you've made is impressive! I'll definitely keep an eye on this fork as it evolves. I am still not sure if it's ever going to be a good fit for "canonical" Joker, but at some point it may become too useful and bring too much value to warrant the merge (not sure if this is your goal though). Some concerns I have include a couple of minor technical points and one higher level design question:

  1. Gostd more than doubles the size of the executable. This is probably no big deal though, as even ~40M is not terrible by modern standards.
  2. It makes the startup slightly slower, although I was pleasantly surprised by how small the increase is. On my 2019 MBP it's 38ms for Joker vs 41ms for Joker-gostd. I suspect the difference will be bigger on slower machines, but probably tolerable (and perhaps even negligible).
  3. Gostd brings a lot of complexity, and it remains to be seen if the value it provides is worth that complexity. As you point out in the docs, the API it exposes is not high level or idiomatic Clojure API that people normally would want to use directly. Instead, it's supposed to be wrapped by higher level API, which can be done without modifying Joker's source code. While this is true, I am not sure how this would work in practice. Where would this higher level API live? One of the value propositions of Joker is that it's a single binary with no dependencies and "batteries included". If that higher level API is built into Joker's executable, it defeats the purpose of being able to write that API without modifying Joker's source code. (At that point it's easier to just wrap native Go code directly, as done for existing standard namespaces.) If it's not built into Joker's executable, than it would have to be an external dependency, perhaps leveraging ns-sources feature. Maybe that's OK, but I'd certainly like most of the "standard library" to be built into Joker itself. BTW, this starts to look a bit like the question "How do we enable third party libraries that have access to native Go?". Gostd is certainly an answer to that, but one other direction I'd like to explore some day is using Go's plugins. On the other hand, maybe having access to vast Go standard library is valuable enough even without higher lever wrappers. I would recommend picking a few use cases and see what it would be like to implement them with gostd. One thing I wanted to do the other day was to send an HTTP request with gzipped body. I ended up doing that with curl as Joker doesn't currently have the API for gzip compression, but it'd be interesting to see it done with gostd. Maybe you already use gostd in your scripts, in which case it'd be cool to see some examples. Thanks!
jcburley commented 4 years ago

Thanks, that helps! The progress you've made is impressive! I'll definitely keep an eye on this fork as it evolves. I am still not sure if it's ever going to be a good fit for "canonical" Joker, but at some point it may become too useful and bring too much value to warrant the merge (not sure if this is your goal though).

I'm not sure whether you really meant "too useful...to warrant the merge" in the sense of having it be a distinct fork/product, or meant "...to not warrant the merge"?

Meanwhile I'm quite on the fence about this myself. It's getting close to being super-useful for the work (mostly research/prototyping of a new architectural model) I've been planning, but I don't think I'd need canonical Joker (versus my own fork) to incorporate this, given the experimental nature of that planned work.

Some concerns I have include a couple of minor technical points and one higher level design question:

  1. Gostd more than doubles the size of the executable. This is probably no big deal though, as even ~40M is not terrible by modern standards.

My sense is that, as long as (this fork of?) Joker offers at least 1 OoM lower size versus corresponding Clojure/ClojureScript/etc variants, and >= 1 OoM startup performance, it's bearable.

I originally went down the path (of spurning Clojure+JVM for my plans) because the resulting process took up so much memory that my server was having trouble doing "normal" tasks -- and that CLJ process was a simple demo!

So I'll be quite open to ways to cut down on the size, either via changes to how gostd emits code, via improvements to Joker's memory utilization, or both.

  1. It makes the startup slightly slower, although I was pleasantly surprised by how small the increase is. On my 2019 MBP it's 38ms for Joker vs 41ms for Joker-gostd. I suspect the difference will be bigger on slower machines, but probably tolerable (and perhaps even negligible).

I'm glad to see that. The lazy-loading of namespaces was an inspired idea (and not mine, IIRC), plus much easier than the optimizations I had in mind (some of which might still work out, but Go currently doesn't support some of them).

Exploring ways to further reduce startup time remains on my list of stuff to do.

  1. Gostd brings a lot of complexity, and it remains to be seen if the value it provides is worth that complexity.

Yes. Besides gostd itself being poorly architected/designed (it's grown too organically, being a prototype; though I do refactoring now and then, including last night, which should get push to my fork in the next few days), it's yet another component that would need to be maintained for people and orgs to feel as though they could rely on it. Right now I have enough trouble "maintaining" it myself, that I'd want to refactor it substantially before inflicting it on canonical-Joker devs (like yourself).

As you point out in the docs, the API it exposes is not high level or idiomatic Clojure API that people normally would want to use directly. Instead, it's supposed to be wrapped by higher level API, which can be done without modifying Joker's source code. While this is true, I am not sure how this would work in practice. Where would this higher level API live? One of the value propositions of Joker is that it's a single binary with no dependencies and "batteries included". If that higher level API is built into Joker's executable, it defeats the purpose of being able to write that API without modifying Joker's source code. (At that point it's easier to just wrap native Go code directly, as done for existing standard namespaces.) If it's not built into Joker's executable, than it would have to be an external dependency, perhaps leveraging ns-sources feature. Maybe that's OK, but I'd certainly like most of the "standard library" to be built into Joker itself. BTW, this starts to look a bit like the question "How do we enable third party libraries that have access to native Go?". Gostd is certainly an answer to that, but one other direction I'd like to explore some day is using Go's plugins.

The Classpath support (already in canonical Joker) should provide one reasonable approach to supporting this.

I'm also amenable to automatically detecting certain Go patterns and converting them to Clojure idioms instead of (or in addition to) the low-level stuff. I'm not sure how far we can go with this, but it's worth considering.

On the other hand, maybe having access to vast Go standard library is valuable enough even without higher lever wrappers. I would recommend picking a few use cases and see what it would be like to implement them with gostd. One thing I wanted to do the other day was to send an HTTP request with gzipped body. I ended up doing that with curl as Joker doesn't currently have the API for gzip compression, but it'd be interesting to see it done with gostd. Maybe you already use gostd in your scripts, in which case it'd be cool to see some examples.

I haven't started using gostd in my scripts, but one feature I've partially implemented allows specifying additional Go packages (along with their namespace prefixes, go.std. being the one for the Go src/ tree).

Supporting sites that want to build their own custom Joker executables, wrapping arbitrary Go libraries (including their own), could be quite a big win.

Based on past experience, it's hard to predict whether this'll "take" as something useful for enough people, and especially what creative use cases (if any) will come out of it.

Certainly, once I have some decent gostd-using Joker code up and running, I'll be pushing it to a suitable repo. For me, one missing element was wrapping net/smtp (which obviously would have taken vastly less time to just wrap by hand!); but I have larger plans than that, and generally I just like working on code-conversion/transformation tools.

If it seems best to make this a distinct version of Joker, there's surely plenty of common code that could be refactored into distinct Go packages (as libraries), including parsing and such; and it might be useful to give it a distinct name, one that brings out the close(r) connection between Clojure and Go that it would represent. (E.g. "Gojure".)

But that's aways down the road.

I really appreciate your input, and look forward to pushing out new stuff for your continued review soon!

candid82 commented 4 years ago

I'm not sure whether you really meant "too useful...to warrant the merge" in the sense of having it be a distinct fork/product, or meant "...to not warrant the merge"?

Haha, sometime I confuse myself with my English. I meant the latter, but I do agree it probably makes more sense to keep gostd a distinct version of Joker for the foreseeable future as the design goal, use cases and potential user base may be quite different for gostd.

jcburley commented 4 years ago

I've just now pushed some pretty big changes in terms of effort and code, but mostly they're about generating documentation for receivers and related refactoring. E.g. see:

https://burleyarch.com/joker/docs/amd64-darwin/go.std.net.html#_types

(I'm not thrilled with the style, but don't want to fuss with that just now.)

Much work still to do, though. E.g. add (to namespaces) types that do not have any receivers associated with them, remove the "static" constructors (Foo.) once new works for all supported types, handle receivers with abstract types but concrete private implementations of them (returned by, for example, os.Stat())....

candid82 commented 4 years ago

Closing as this is not actionable anymore. It's been decided to keep gostd fork separate from "main" Joker.