Confirm-Solutions / imprint

The Imprint Project
BSD 3-Clause "New" or "Revised" License
13 stars 3 forks source link

Designing a web interface. #135

Open tbenthompson opened 1 year ago

tbenthompson commented 1 year ago

Scope for the web interface

The goal here is to have an EXTREMELY easy getting started experience.

There are really two different things that I'd like to see here:

  1. A demo that holds the user's hand and shows them around an example imprint run.
  2. A basic Imprint interface that works for 1D and 2D problems. I don't think we want to support the full capabilities of imprint because of the complexity and computational requirements.

Then, a user that is convinced by those web tools and wants the full power would go spend the time to install Imprint (#99) and do whatever they want to! Or hire Confirm Solutions =)

In-browser vs traditional web app

I'm advocating for the fully in-browser option. I think a traditional backend will be a lot more work than might initially be expected. Reasons:

Other thoughts:

Running imprint in a browser

I see two broad options here:

  1. Figure out how to run our Python code in the browser.
  2. Re-write Imprint for the browser.

I would lean towards the first option because it's much less on-going maintenance. As far as the initial effort, re-writing for the browser actually would not be very much work. We'd probably need to duplicate less than 1k lines of code and testing/verification would be easy since we have a working version already. But, I do lean pretty strongly towards figuring out how to run something very similar to our existing Python code in the browser because that's going to be much less ongoing maintenance. If it means we need to redesign portions of Imprint, that's okay!

Some useful links:

kentcr commented 1 year ago

I can't say I'm following the reasoning here. Even an ultra-optimized rewrite for the browser (who knows, some Fortran compiled to WebAssembly with Web Workers for parallelization) is going to be limited by any browser performance penalties and by whatever hardware a user tries it with, which could be a smartphone or tablet. Even with something more powerful, users generally don't like things that make their system slow to a crawl, which might serve as a signal of low quality, especially with the new norm of having brisk access to extremely powerful AI systems for free.

While I can imagine some work required to scale up a backend, I don't see how that becomes easier in-browser, and I'd expect the capabilities of any free web interface to constrain compute expenses. I also don't see how in-browser does much for privacy as a typical user cannot easily establish that private data will not being transmitted though a web interface without cutting Internet access. (It's not a great situation for installed software either, but there are options like setting up a firewall rule or non-networked namespace that don't require a code review.)

I am also fairly preferential towards using cheap hardware rather than expensive wetware (developer time) unless it's clear the hardware will ultimately cost more, but I'm not going to stop others from having fun optimizing. That all said, I don't see a reason not to leave the in-browser door open by exclusively following paths that can produce LLVM IRs.

If the backend route was chosen, would these most basic problems that didn't take >=10 core-minutes still be worthwhile as an initial demo? In other words, is it worthless to throw up the current implementation with a limited web interface as a first step?

tbenthompson commented 1 year ago

Nice, thanks for the thoughts! Interesting points and I'm happy with the conclusion especially if we can have an auto-scaling backend.

Have you ever run into Modal Labs? They're doing some super cool stuff and are one of the only examples of a serverless/pay-as-you-go compute service that has GPUs available for fast scaling. We've used their stuff for GPU-enabled continuous integration and its been very pleasant and shockingly easy to use. Makes me super super happy.

Even with something more powerful, users generally don't like things that make their system slow to a crawl, which might serve as a signal of low quality, especially with the new norm of having brisk access to extremely powerful AI systems for free.

That's fair. For "real" designs, we're never going to achieve the snappiness of something like a GPT just because the runtime computational requirements are much larger for what we're doing. At inference-time, the difficulty with a GPT model is not speed but rather the problem that the model parameters don't fit on a single GPU. For us, the difficulty is speed.

Excited!!

kentcr commented 1 year ago

My incomplete understanding is that the code susceptible to parallelization, so if distributed across enough PUs, a snappy interface becomes possible. I expect a lot of calculations are cached (though haven't found that in samples/batches yet), in which case it may still be possible to take a system time hit in exchange for a real time speedup.

I've seen but not used Modal. My main concern would be avoiding vendor lock-in, along with ensuring it still works decently on conventional hardware, though I can imagine both being minor issues.

tbenthompson commented 1 year ago

so if distributed across enough PUs, a snappy interface becomes possible.

I think this will depend on a quantification of snappy and the size of the problem. To a first approximation, the cost increases by about 100x for each added dimension. Simple optimized 1D demos will run in a few seconds on a single GPU. Simple optimized 2D demos might take a minute or so when parallelized over 5 GPUs. 3D will take tens of minutes to hours even on 10-50 GPUs.

I expect a lot of calculations are cached (though haven't found that in samples/batches yet), in which case it may still be possible to take a system time hit in exchange for a real time speedup.

I'm always on the lookout for caching potential in these problems! Potential for caching that I've run into so far in this problem space:

I've seen but not used Modal. My main concern would be avoiding vendor lock-in, along with ensuring it still works decently on conventional hardware, though I can imagine both being minor issues.

Yeah, I think those will both be minor issues in this case. The conventional hardware issue is mostly covered by running JAX with or without CUDA. The Modal interface can be very thin, basically just using it as a way to rapidly grab and release a lot of GPU-enabled servers.

Excited!!

Thoughts on how to start building something?

kentcr commented 1 year ago

I'm no cloud guru and am thus far struggling to find clear numbers of max amount of scaling, but Amazon apparently has clusters with >100k Xeon Platinum cores, so without shared memory issues I'd expect snappiness is still viable.

I'm not seeing info on Modal's capacity here, but I can imagine that being a limiting factor. It at least doesn't seem crazy to start simple and shop around as compute requirements increase though. There's global competition at this point, but I'm not clear on how these privacy concerns would affect cloud processing. Much of this seems like a decision involving businessy rather than technical variables.

I don't see that any of that prevents progress on a web interface though. There are lots of options, but main considerations that come to my mind are:

Or if we want to go the less talk, more action route, we can just ask ChatGPT what it's good at and have it make us a quick example!