Steps for supporting a language..

ylluminate commented 7 years ago

Apologies in advance as I'm just not sure where a general WebAssembly dev chat might be taking place.

What are the "proper" steps for bringing a language over to WebAssembly presently, especially a language built on LLVM? I haven't been able to find any kind of documentation or clearcut discussion that revolve around this so far.

I've been looking at possibly using Opal as a base, however since Crystal is based on LLVM, I think it would be more correct / sane to bring it directly over to WebAssembly. Could someone please paint a broad stroked picture (add details if you wish) of what the "proper" way to tackle such an effort would be at the moment?

kripken commented 7 years ago

I don't think there is a single "proper" route, this stuff is still being figured out :) Some options:

In general, if your language uses LLVM, you can use the LLVM wasm backend directly to emit wasm for you. Things are still a little unstable there - LLVM+lld are moving to use wasm as object files, see here but that's not 100% done yet. You will need to make various decisions about what to include for your runtime, standard library, create your own JS code to load the wasm, etc. Rust and other languages are in the process of figuring all that out, it's interesting to see how their experiments will go.

Another option is to use emscripten (which can wrap around the LLVM wasm backend). That will link in libc and other runtime things for you automatically, generate JS for you, provide WebGL and other web API wrappers, etc. But it's optimized for C and C++, and other languages might want different things (e.g., maybe you don't need the malloc it will automatically provide). We are working to improve that, but C and C++ have been our focus in the past. For a language with needs very similar to C and C++, this could be a good option.

Finally, some languages are emitting wasm directly, like Go, F, AssemblyScript, Cheerp, etc. You can use various techniques there: write your own wasm emiting code (Go, Cheerp), use a library like the wasm spec interpreter (F) or binaryen (AssemblyScript) etc.

ylluminate commented 7 years ago

Thanks so much for the thoughts @kripken. emscripten might be the most sane path initially. Crystal does use a GC, but I don't think supporting this would be important at all going to js / browser since there's no direct allocation in code. I've been surprised to see how fast Crystal is going with it now being past Rust on TIOBE and jumping so insanely quickly over such a short period of time. There's been a lot of talk about isomorphism and Opal has been on the table, but folks keep thinking that going directly over via WebAsm would be the better route.

I guess there could be some fear that emscripten might be a little too heavy, but frankly I could see some remarkable benefits more or less everywhere from what you've mentioned there re: emscripten.

binji commented 7 years ago

Apologies in advance as I'm just not sure where a general WebAssembly dev chat might be taking place.

We chat on irc some. It's mostly active weekdays, during US workday hours. It would be cool to have more folks on at other times too, though!

ylluminate commented 7 years ago

That's great to know @binji! I'm sure an IRC-Gitter bridge (Crystal uses this) would also be quite useful since a lot of other projects do this as well and it just adds to the conversation.

ylluminate commented 7 years ago

So @kripken I guess the next extension to this would be regarding what output needs to be produced by the crystal compiler in order for emscripten to receive it? I'm not seeing, so far, how llvm being the common point produces the common byte code or output that'll be consumable.

The main argument I'm getting right now from actual crystal devs is effectively "since emscripten + asm.js don't provide garbage collection, we cannot do anything yet." I was under the impression that GC would not be an issue until we get to actual webasm output since the JS VM will be handling GC. Handling memory allocation/deallocation for C/C++ would be necessary since it's part of the language itself and it doesn't take GC into consideration, but a language like Crystal should be able to take advantage of the current emscripten implementation so as to not require any manual memory handling... Am I missing something here?

kripken commented 7 years ago

There are several separate issues with GC. In the future, wasm will support GC objects directly, so it will integrate with the JS GC.

But meanwhile, you can ship your own GC implementation as part of your language runtime. Unity does that for C#, for example, and that's what happens when you run the compiled Lua or Python VMs, etc. A Boehm-style GC should just work, the only tricky thing being roots on the stack.

So if you already have your own GC in your runtime, all you need to do to try this is to create an LLVM bitcode file (containing your compiled program + GC and other runtime) and just pass that to emcc. It'll compile the bitcode to wasm+JS.

ylluminate commented 7 years ago

Huh, very interesting. Let me rephrase and tell me if this groks your statement generally: emscripten's goal isn't a direct JS conversion like Opal does it and so we do need to look at GC, but that's (perhaps easily) doable.

RX14 commented 7 years ago

Crystal currently uses bdwgc, and has lots of roots on the stack, is there any kind of possible solution?

kripken commented 7 years ago

@ylluminate: yes, emscripten (and the asm.js and wasm backends in LLVM) consider C-like languages, not languages with native GC types (at least for now). But, the GC runtimes for those languages are typically written in C, so you can compile those and use them. That's very different than Opal and GWT and others that compile to JS with GC objects, yes.

@RX14: For stack roots, one option is to only collect in between browser frames - they need to be short anyhow, as its an event/callback model. And if you call the GC in between frames, when there is nothing on the stack, then you are safe.

Alternatively, if that's not an option (like a very long computation running in a web worker), a compiler would need to mark the stack manually. In C++ you could use an RAII class for pointers instead of raw pointers, where it writes to memory, etc. Or I've seen cases where the code was compiler-generated anyhow, so they just made the compiler emit writes to memory for stack values. It's probably possible to modify LLVM to do this, I think I heard of someone experimenting with that.

RX14 commented 7 years ago

@kripken the thing is we don't ever have nothing on the stack. The language is like go - built around green threads (fibers) with "blocking" calls. We always provide the illusion of a stack (with the illusion of it containing GC roots), even when waiting for events.

Porting crystal is pointless if you just end up with a stdlib which is impossible to port.

RX14 commented 7 years ago

Actually, go seems to be working on wasm, and bypass this problem by managing their own stack in linear memory. What a pain.

kripken commented 7 years ago

I see. Yes, if you always have things on the stack, then you need to manage the stack in a special way. One option is what Go is doing, to manage their entire call stack in linear memory. That's going to add overhead.

Alternatively, if you have a way to just manage pointers on the stack, that could be efficient, like the RAII option mentioned before.

RX14 commented 7 years ago

@kripken is there (or is likely to be) any sort of way to switch stacks in wasm. If not, then that's probably why go is going the slow and painful route.

kripken commented 7 years ago

There has been some theoretical discussion of stuff like that (switching stacks, inspecting the stack, etc.), I think motivated by Go for example. But I don't think there's been anything recent. Which I guess is why Go is doing things the way it is. I would guess that if Go doesn't end up with good enough performance, that could motivate adding support for this.

binji commented 7 years ago

It's come up that the exception handling proposal can probably be extended to support stack switching. There's no concrete proposal, though.

ylluminate commented 7 years ago

Where would be the place to talk with or perhaps even collaborate with Go folks on this issue?

This all makes me wonder if the Opal approach may not be more sane and productive one at this point in time...

RX14 commented 7 years ago

Indeed, while wasm is awesome, it'll take one or two years I think before it matures to be able to truly support Crystal as-designed. People can either hack it in before then or wait or skip wasm and make a crystal codegen backend for JS. Not that the latter won't likely be similarly hacky around concurrency and GC.

binji commented 7 years ago

Where would be the place to talk with or perhaps even collaborate with Go folks on this issue?

They mostly seem to be here: https://github.com/golang/go/issues/18892

And we have regular WebAssembly Community Group meetings. If you have a proposal idea, it may be worthwhile to bring it up there.

kripken commented 7 years ago

It's probably possible to modify LLVM to do this [make sure values on the stack are in linear memory, so they can be scanned by a Boehm-style GC]

We could do this in binaryen too, I realized. Basically a pass that ensures all i32s are spilled to a linear memory location. Then Boehm GCing would just work.

Let me know if there's interest in such a pass, should be easy to write.

RX14 commented 7 years ago

Basically a pass that ensures all i32s are spilled to a linear memory location. Then Boehm GCing would just work.

Is wasm 32-bit?

kripken commented 7 years ago

Yes, current wasm is 32-bit. There are plans for a future wasm64.

ylluminate commented 7 years ago

So @kripken since you're able to do this it certainly sounds like a winning proposition that would make this pretty straightforward, right @RX14?

RX14 commented 7 years ago

It's certainly not something i'm going to have time to work on - i'd rather work on windows support. If @kripken thinks he can get a GC with a simple malloc/realloc/push_stack interface similar to bdwgc working on wasm that's fantastic, and we'll probably use it in the future when we port to wasm. But wasm is hardly a priority for me personally.

And no, nothing is exactly straightforward about porting a self-hosted language to a new platform :)

ylluminate commented 7 years ago

What would be involved Crystal-side once @kripken is able to implement it?

RX14 commented 7 years ago

I don't know, i'm not too familar with wasm and it's limitations. The only way to find out what they are is probably to try to port.

ylluminate commented 6 years ago

Have you gotten any further in this (thought) process @kripken?

kripken commented 6 years ago

What process do you mean? Reading the end of this issue, looks like I proposed writing that stack spilling pass, but it sounded like there wasn't interest from @RX14 to use it?

ylluminate commented 6 years ago

I think that there may be others who would be willing to work through this even if @RX14 doesn't have the time right now since he's polishing off the native Windows support. I have horrendous demands on my time presently, but I know of others that are also interested. I know others such as @t-richards have raised the importance of this and may have interest in helping and then some of the framework developers such as @paulcsmith of Luck Framework have some serious interest in this. We may be able to rally through this if we can get over this hump! 😸

RX14 commented 6 years ago

The first thing to get working is bdwgc on a simple test C program. Once that's working then its just going through the same process as windows is now.

kripken commented 6 years ago

@ylluminate: ok, let me know if someone's interested to work on this. I'd be happy to help out on the binaryen/emscripten side, with that pass or other stuff.

ylluminate commented 6 years ago

So @kripken I guess we need to set up some kind of test initially. What would be the process of getting bdwgc to run via emscripten/binaryen? I think I'm going to at least dabble in this in order to get it kickstarted... This is a pretty important feature that's just lacking bandwidth, so I don't want that to hold it up.

@RX14 so Crystal uses vanilla bdwgc, right?

RX14 commented 6 years ago

@ylluminate yes vanilla bdwgc

ylluminate commented 6 years ago

@kripken what's your take on this: https://github.com/ivmai/bdwgc/issues/163

kripken commented 6 years ago

@ylluminate looks like @juj states the core stack-walking issue there, which is what I think we can experiment with working around using a binaryen pass to manually spill the stack.

I'm not sure about the runtime errors that are mentioned there. Those are from several months ago, perhaps it's worth trying again now.

ylluminate commented 6 years ago

@kripken I've got several guys committed to this now. We'll push through this and make it work if you can get us the binaryen modification.

kripken commented 6 years ago

@ylluminate Ok, great. I wrote that pass in #1339, should be good enough to experiment with. Let me know how it goes and if I can help.

ylluminate commented 6 years ago

@kripken there's been some worry here about DOM access in WASM. Obviously from a lot of chatter I'm seeing, DOM WASM access is still quite some time out (a lot of pink unicorn icons floating around it apparently)...

But as far as I can figure, one of the advantages of using Emscripten is that it is still, for the time being, going to JS / asm.js and the DOM should still be accessible. Is this correct and if so how would this DOM access be accomplished?

There's a good deal of push to make a transpiler a la Opal, but given that Crystal is built atop LLVM and you've done what you've done, that would be a very intensive investment compared to just getting it running atop Emscripten with existent stdlibs, etc.

If we can access the DOM conveniently then it's just a matter of holding off on actual TRUE WASM output for web application centric apps until we figure out a way to access the DOM with WASM (ie, if a game is being developed in Crystal and it doesn't need the DOM directly then this could fill temporary isolated DOM needs)...

Does that make sense or am I spouting nonsense?

kripken commented 6 years ago

Good questions. In the end it really depends on what the Crystal community wants from Crystal-on-the-Web.

First, DOM access isn't important in many cases, for example, most Rust-on-the-Web use cases I've seen. People use Rust there so they can write computational code in a language they prefer over JS, and it runs faster. Then they call out to JS and do DOM stuff there if they need that, but often they just create a library from that Rust and then call it from JS to do computation, so the DOM never comes up. I haven't used Crystal much, but from what I see of the language, it should work very well that way.

If you do want DOM access as a core feature, emscripten does give you a few ways to make it easy. You can use EM_ASM blocks (would need work on the Crystal side) or JS libraries (would just work) to make it simple to call into JS. Using those and the emscripten runtime support, you can build a DOM access library in Crystal (Emscripten does this for C++ in the embind and WebIDL binder features, for example). The main problem there is you can't collect cycles between the two worlds yet, but otherwise it can work very well.

In the farther future, wasm may be able to access the DOM directly. That will likely take years to be designed and rolled out to browsers. Meanwhile, if you implement a DOM library today as in the previous paragraph, you could use the new DOM capabilities as an optimization when they do arrive - they would solve cycles, and would make things faster.

ylluminate commented 6 years ago

The main impetus presently for Crystal WASM is to use it for web app frameworks such as Lucky, Amber, Kemal, and bringing Hyperloop over, thus seamless DOM is critical.

Could you clarify your statement as I'm not groking it quite yet?:

The main problem there is you can't collect cycles between the two worlds yet, but otherwise it can work very well.

It would have other fantastic applications, but, like I said, we've got to get to a seamless experience for web development first and foremost.

RangerMauve commented 6 years ago

In Rust-land there's a crate called stdweb which is adding interop with browser APIs, and yew which is adding a react-like interface with the DOM for WASM applications in Rust.

kripken commented 6 years ago

The main problem there is you can't collect cycles between the two worlds yet, but otherwise it can work very well.

Overall the issue is JS has a GC, and wasm can have a GC if you use Boehm, but no single GC can see it all. You can only connect the two manually. So cycles of a wasm object using a JS object and that JS object using that wasm object can't be cleaned up (without manually breaking the cycle).

In more detail, wasm doesn't have direct access to the JS object world, so if you want wasm to do something to a JS object, you need something like this:

In JS, a map of integer IDs to JS objects.
In wasm, you use the integer IDs (since it can handle integers ok)
When you want wasm to be able to refer to an object, you create an ID for it and put it in that JS map.
When wasm calls JS, it gives the integer to JS. JS looks up the object in the JS map, and can then manipulate it.

So you must manually "bind" an object to an integer ID, and the problem is that you must also "unbind" manually as well: as long as the object is in that map, it won't be JS GC'd (unless the entire map can be GC'd).

You can do GC using Boehm in the compiled code, and maybe the finalizer for an object can call JS to unbind the integer ID for it. But this does leave the issue of cycles through both compiled code and JS, there is a separate GC for each, and a manual interface for connecting them. That's going to be a problem until wasm gets GC support, probably.

Btw, do those frameworks need to directly call into the DOM? In Ember for example, they are adding Rust/wasm just for the core VM component there, which doesn't need direct DOM access, as I understand it.

ylluminate commented 6 years ago

So given the discouragement from the missing direct DOM access and a tremendous amount of resistance due to that (and some other life issues), I thought I'd step back to this issue to see if anything has changed.

Has anyone worked up a clean or direct method of handling DOM interaction via Emscripten?

Has anything else changed that would affect our above discussions?

kripken commented 6 years ago

There is a reference types proposal for wasm now, with initial experimental impls in VMs - enough to start experimenting with some DOM interaction, but still far from full GC. See a discussion about using that in AssemblyScript here: https://github.com/AssemblyScript/assemblyscript/issues/89

erlend-sh commented 6 years ago

You can see an example workaround for DOM access here:

https://github.com/DenisKolodin/yew#virtual-dom-independent-loops-fine-updates https://github.com/rust-lang-nursery/rust-wasm#the-dom-gc-integration-and-more

WebAssembly / binaryen

Steps for supporting a language.. #1312