econ-ark / OverARK

Project management and administration for econARK
6 stars 1 forks source link

Speed up loading of REMARKs using JupyterHub or alternative #5

Open shaunagm opened 5 years ago

shaunagm commented 5 years ago

Update: Google Colab is looking very promising. Next steps:

If we decide on Colab, we will then:

Currently, mybinder takes several minutes to load our remarks and other notebooks. We'd like anyone loading a notebook from the econARK site to be able to access it quickly, definitely in under 30 seconds and ideally faster. Not sure whether we need special hosting, caching, etc to make this happen. Relatedly, our current setup causes issues with dependency management (see issue #12).

Use cases:

The above use cases have different limitations. A lot of platforms I've investigated have access control such that it would be hard to provide this service to anonymous internet people but easy enough for anyone we approve of and who can spare a few minutes of initial setup. It's possible we could set up a two tiered system where community members (and the students/workshop attendees of community members) can launch notebooks quickly and internet strangers continue to use the mybinder system. (Although this doesn't address the dependency management aspect, just the performance aspect.)

Potential approaches:

1) find a hosting platform that allows us to do this (trying this first, since it's the easier option; see list below) 2) set up our own hosting system, perhaps on AWS (OTS did this for us briefly last year, for a specific event)

People to consult:

Hosting Platforms & Notes

shaunagm commented 5 years ago

@llorracc, @mnwhite - I'm going to check out Google Colab sometime in the next week or so, using one or two of the existing notebooks as tests. Are they all about the same in terms of computational power needed? If not, can you suggest a notebook that's on the computation-heavy end, but not an outlier?

llorracc commented 5 years ago

Shauna,

The BufferStockTheory.ipynb REMARK notebook is a good test case. The objective here, for workshops etc, is a bit different from the other case where we wanted computational power. What I'm hoping to find for ordinary workshops and notebooks used therein is a way to speed up the initial launching of the notebook. Especially in cases where the notebook has a lot of setup stuff. The BufferStockTheory.ipynb notebook checks to see whether "latex" has been installed on the VM and if so changes some configuration stuff. Unfortunately, it can take 2-4 minutes for the remote VM to FINALLY "go live" because of all of the prep software installation that has to be done. BufferStockTheory provides a good example of a case where latex is preferred (and used if the tools necessary are available).

On Wed, Mar 20, 2019 at 2:46 PM Shauna notifications@github.com wrote:

@llorracc https://github.com/llorracc, @mnwhite https://github.com/mnwhite - I'm going to check out Google Colab sometime in the next week or so, using one or two of the existing notebooks as tests. Are they all about the same in terms of computational power needed? If not, can you suggest a notebook that's on the computation-heavy end, but not an outlier?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/econ-ark/OverARK/issues/5#issuecomment-474835846, or mute the thread https://github.com/notifications/unsubscribe-auth/ABQlf7yxnrx-M4U7Q4xwaeVHs2iyxdufks5vYju1gaJpZM4bX937 .

--

shaunagm commented 5 years ago

I'm liking Google CoLab so far. Loading a notebook stored in github was as simple as using the customized url which takes the format:

colab.research.google.com/github/$our_organization_name/$repository_name/blob/master/$relative_path_to_notebook.ipynb

It loaded fairly fast, about 10 seconds by my count, and it looks like the Latex is all there.

The hosted runtime options seem fairly limited - only Python 2.7 and 3.6 are options, and they come with certain libraries pre-installed, which means we don't have as much flexibility in choose the environment we want the notebooks to run in. I think that means we're stuck with a little cell at the top of all our notebooks that looks something like:

!pip install econ-ark
!pip install matplotlib==1.2  # made up example for if we need to use a different version

But the notebooks currently have set-up cells anyway, so. Anyway, here's a list of issues I encountered and their solutions:

  1. TkAgg error
ImportError: Cannot load backend 'TkAgg' which requires the 'tk' interactive framework, as 'headless' is currently running

This is due to trying to load get_ipython().run_line_magic('matplotlib', 'auto') instead of get_ipython().run_line_magic('matplotlib', 'inline') (inline is the only matplotlib backend supported on CoLab) (no I don't know what that means). This should be easily fixable by rejiggering that if/else statement to catch the CoLab environment in the if, but for now I just replace 'auto' with 'inline' manually.

  1. Import HARK error

As expected I then got told that HARK didn't exist, and added !pip install econ-ark to the top of the first cell.

  1. Latex display issue

In cell 25 the latex on lines 22-26 (I think, annoyingly it doesn't give line numbers) is causing an error. I deleted the four lines and it ran fine but without the necessary output. I don't know what's going on well enough to fix it. Maybe the Latex didn't load as well as I thought? Same/similar issue with the last code cell.

Anyway, that was surprisingly straightforward, but I'm also not familiar enough with the notebook to know if there's content errors or missing pieces. Chris & Matt, you should definitely take a look!

DrDrij commented 5 years ago

Great summary @shaunagm! Its exactly where QuantEcon is also at - including a cell with pip install commands to get the requirements. I think it's a good solution, the requirements are transparent and means the notebooks can run standalone.

llorracc commented 5 years ago

If @Shauna Gordon-McKeon shaunagm@gmail.com and @Andrij Stachurski dr.drij@gmail.com and the QuantEcon team are all converging on Google colab, I feel that it must be right!

What is the next step we need to take to start using CoLab?

On Wed, Mar 27, 2019 at 8:59 PM DrDrij notifications@github.com wrote:

Great summary @shaunagm https://github.com/shaunagm! Its exactly where QuantEcon is also at - including a cell with pip install commands to get the requirements. I think it's a good solution, the requirements are transparent and means the notebooks can run standalone.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/econ-ark/OverARK/issues/5#issuecomment-477402493, or mute the thread https://github.com/notifications/unsubscribe-auth/ABQlfx8Sw6_JYr6iDBdt37BLrgr-Tvrcks5vbBQLgaJpZM4bX937 .

--

shaunagm commented 5 years ago

We chatted about this during our meeting, but I'm going to test out how fast Colab is when loading Latex (Chris, do you know what package that is?) and then, if that works, the next steps are a) making sure existing notebooks all work in colab; b) changing how we refer to the notebooks on the website to point to colab instead of mybinder and c) re-organizing the remarks repo to remove the mybinder stuff since it will no longer be necessary.

shaunagm commented 5 years ago

Update: the issue with overline appears to be a minor syntax error. There's a cell in the original notebook that uses the syntax overline c instead of overline{c}. Once I fixed that, and replaced underline with underbar, I ceased to get errors, although I can't verify that the output is what's desired beyond saying "yup sure does include some underlines and overlines".

I'm finding the underline issue deeply confusing, because isn't underline in basic latex? Why would we need to import anything?

I tried adding !pip install jupyterlab_latex and it doesn't change initial load time at all, since it's not executed until the cell is run. When you do run the cell, it adds another 3-4 seconds, which is not great but not terrible. Importing jupyterlab_latex did not solve the underline issue though.

mnwhite commented 5 years ago

Does basic LaTeX have underline in math mode? That might be the issue.

On Thu, Mar 28, 2019 at 3:19 PM Shauna notifications@github.com wrote:

Update: the issue with overline appears to be a minor syntax error. There's a cell in the original notebook that uses the syntax overline c instead of overline{c}. Once I fixed that, and replaced underline with underbar, I ceased to get errors, although I can't verify that the output is what's desired beyond saying "yup sure does include some underlines and overlines".

I'm finding the underline issue deeply confusing, because isn't underline in basic latex? Why would we need to import anything?

I tried adding !pip install jupyterlab_latex and it doesn't change initial load time at all, since it's not executed until the cell is run. When you do run the cell, it adds another 3-4 seconds, which is not great but not terrible. Importing jupyterlab_latex did not solve the underline issue though.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/econ-ark/OverARK/issues/5#issuecomment-477735762, or mute the thread https://github.com/notifications/unsubscribe-auth/ANUQFU_bvqcp93WgS1CAgpG5QLIaqLNpks5vbRW_gaJpZM4bX937 .

shaunagm commented 5 years ago

I've got no idea - I've barely ever used LaTeX, and I'm finding the documentation hard to parse. Unfortunately Colaboratory's documentation is not great either (documentation & user support has always been a weak point for Google) so I'm not sure what's even running in the notebook. How do you know if you're in math mode? How can I check what LaTeX extensions would have underline implemented?

Anyway, here's a version of the notebook with the fixes described above:
https://colab.research.google.com/github/shaunagm/REMARK/blob/master/REMARKs/BufferStockTheory/BufferStockTheory.ipynb#scrollTo=cB71h4tn1dC0

mnwhite commented 5 years ago

Math mode is anything between dollar signs like $y = 5x -3$ or in environments like {equation} or {eqnarray}.

Underline in math mode (if not standard) should be in the package amsmath. It's one of the packages I include in my document template (along with amsfonts), so I've lost track of what's in it.

 -- mnw

On Thu, Mar 28, 2019 at 3:52 PM Shauna notifications@github.com wrote:

I've got no idea - I've barely ever used LaTeX, and I'm finding the documentation hard to parse. Unfortunately Colaboratory's documentation is not great either (documentation & user support has always been a weak point for Google) so I'm not sure what's even running in the notebook. How do you know if you're in math mode? How can I check what LaTeX extensions would have underline implemented?

Anyway, here's a version of the notebook with the fixes described above:

https://colab.research.google.com/github/shaunagm/REMARK/blob/master/REMARKs/BufferStockTheory/BufferStockTheory.ipynb#scrollTo=cB71h4tn1dC0

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/econ-ark/OverARK/issues/5#issuecomment-477746482, or mute the thread https://github.com/notifications/unsubscribe-auth/ANUQFeem4bgmI3YYPn08bdkeaPF6CQ-xks5vbR1lgaJpZM4bX937 .

shaunagm commented 5 years ago

Okay, I think I was misunderstanding what math mode even is. Yeah, that may be the issue. If swapping underline for underbar doesn't work for aesthetic reasons, I can explore the issue further, but for now I'll hold off.

llorracc commented 5 years ago

Actually, underbar seems to do something completely different from underline: It just prints an underbar character as regular text (rather than underneath the targeted thing). Like Matt, I always import the amsmath environments and so don't really even have a good idea what is in them, but I think underline is, so if we want these figures to work I guess we need latex. It does seem like there should be some way to force install it at the beginning of execution -- people are used to there being a little delay while things start up but will be more discombobulated by a 5 second delay when a simple plot is requested.

Shauna, I don't know how familiar you are with matplotlib. If you compare the output obtained by mybinder and that by colab, you will see that the colab figures are missing important stuff -- like the axes! If you are highly familiar with matplotlib and know exactly how to fix this, then please do so. Otherwise, I will ask my student to try to construct the figures in such a way that they look nice in Jupyter or CoLab or ipython or whatever.

One other thing: The notebooks rely heavily on some Jupyter nbextensions, in particular the "codefolding" extension -- again, compare the mybinder version with the CoLab version to see how useful codefolding is (in hiding the code until the person wants to expose it). I'm guessing that getting codefolding working on CoLab just requires putting more of the config stuff in the /binder directory (either equirements.txt or mayb postBuild) into the beginning of the Jupyter notebook.

PS. It's unfortunate that it seems that a notebook written for CoLab won't work with mybinder and vice versa, because the former requires the !pip install stuff to be at the beginning of the notebook (which is a better place for it) and mybinder requires it to be in a special folder. The CoLab approach is better, but we've already configured a bunch of things to work with mybinder, and obviously one would prefer to be able to choose on the fly whether to view a given notebook in mybinder, CoLab, or some other tool (google "Six easy ways to run your jupyter notebook in the cloud" from "Data School" for an overview of the options, which seem to be growing by the minute).

On Thu, Mar 28, 2019 at 4:05 PM Shauna notifications@github.com wrote:

Okay, I think I was misunderstanding what math mode even is. Yeah, that may be the issue. If swapping underline for underbar doesn't work for aesthetic reasons, I can explore the issue further, but for now I'll hold off.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/econ-ark/OverARK/issues/5#issuecomment-477750954, or mute the thread https://github.com/notifications/unsubscribe-auth/ABQlf7d-3SRYT6uwY_LBVMilwUOlEcbBks5vbSB7gaJpZM4bX937 .

--

shaunagm commented 5 years ago

I'm not super familiar with matplotlib. I'm sure I could fix the display issues given enough time but it may be more efficient to have a student work through it.

I'm going to try to generate a list of libraries and extensions we need. I suppose I could just use everything in the binder requirements.txt folder but I think that will include some extra stuff. But your "jupyter_contrib_nbextensions" is in there, so it's a good start.

re: configuring for both mybinder and colab - we should be able to do that, the question is whether we're okay with the potential added complexity. For instance, we could have a line in the notebook which checks whether something's already installed and only installs it if it isn't.

shaunagm commented 5 years ago

From Chris's email, it seems line of of the biggest barriers with regard to CoLab is installing latex. I'm going to take a look and see if there's a way to install a much smaller subset of the library.

shaunagm commented 5 years ago

Update from the weekly meeting: Chris has tried the approach from this StackOverflow answer in this notebook (on colab) but it's not currently working - I'm going to try to debug.

There also appears to be an issue where notebooks aren't running if you aren't logged in to a google account, which I need to look into.

llorracc commented 5 years ago

There is now a revision of BufferStockTheory.ipynb that works on MyBinder, CoLab, and on my Mac using the local jupyter notebook server that comes with Anaconda.

The notebook requires LaTeX tools from the American Mathematical Society's amsmath package in order for matplotlib to be able to render all the figures, and the solution to this problem is painful: The first (code) cell installs all needed dependencies from scratch, which can be very slow if LaTeX is not installed (e.g., for either MyBinder or CoLab's default environments).

The notebook starts by testing whether LaTeX is installed on the machine. If not, it tests whether the machine is ubuntu or not. MyBinder and CoLab both have a default of ubuntu; so you're in ubuntu, it installs the full version of LaTeX:

!apt-get install texlive dvipng texlive-xetex texlive-latex-extras

in CoLab this seems to take about 2-3 minutes, which is painful but not intolerable. MyBinder can take 10 minutes or more, if it works at all (it seems to fail altogether about half the time). This kind of defeats the purpose of "live" notebooks. (Even when myBinder says it "found built image" it might say "Launching server ... Launch attempt 1 failed, retrying ...")

It appears that MyBinder allows you to use prespecified Docker images instead of their default setup, but it is not clear to me whether that would be any faster. And at this point it looks like CoLab doesn't let you use your own docker images?

If there is not now a way to "pre-cook" or "pre-cache" a VM image to speed up loading, I'm guessing that's not an accident. At some point MyBinder needs a revenue model, and I'd totally be willing to pay something to reduce the loading time from 5 minutes to 30 seconds. I just wish they'd roll out their pricing scheme and let me pay them for this!

PS. The installations are not necessary (and therefore a waste of time) if the libraries are already available. But the overriding goal was to have a single notebook that works everywhere, and so the installation stuff all has to be in that first cell since CoLab does not have a mechanism like MyBinder for prespecifying requirements.

shaunagm commented 5 years ago

@llorracc can you add a link to the "revision of BufferStockTheory.ipynb that works on MyBinder, CoLab, and on my Mac using the local jupyter notebook server that comes with Anaconda"?

llorracc commented 5 years ago

There WAS a link -- it just didn't work! That's what I get for trying to construct the link myself. I've now fixed it -- just click on BufferStockTheory.ipynb above

shaunagm commented 5 years ago

I've heard from a couple folks here at PyCon, including @mriduls (one of our sprinters), that many projects use MathJax to load subsets of Latex fast. Here's some info on configuration options. Still need to research this

llorracc commented 5 years ago

Hmmm, I had thought of MathJax as more of a rendering engine (it draws the characters on the your bitmap) than a tool that can read packages and interpret things like \underline. But underline is part of the amsmath package, and the configuration link you sent seems to be reading in something called amsmath.js which presumably is a javascript version of the amsmath package. So maybe it will work (if matplotlib plays nice with MathJax ...)

shaunagm commented 5 years ago

@llorracc, what do you need done by the 24th? Specifically, which notebooks do you need to be ready, and what are the ideal, and maximum acceptable, times for them to load?

I know you said you wanted Bufferstock Theory ready - any others?

llorracc commented 5 years ago

The best test-case is the BufferStockTheory remark, partly because it has a boolean that determines whether to use the "full version" of LaTeX (which is the huge 2.3 gb thing) or the built-in slimmed down version. It would be easy to compare the two versions and use that to test the degree of speedup.

The degree of speedup needed is actually a more complicated question than it might seem, because of the different ways that the different tools work. For MyBinder, the whole virtual machine is built according to specs in the /binder file before anything is displayed. For Google CoLab, you have to include the packages you need in a "pip install" command in the first cell, so the notebook displays immediately but can't be used until it finishes building.

Let's wait until I get back to the US early next week to focus on this; it shouldn't take too long with both of us focused on it. Also, Andrij has been tasked with essentially the same mission by QuantEcon and so I want to piggyback on what they do. That changes my view that we should make a "big push" to come up with a long-term solution; instead, our objective should be to have a quick-and-dirty solution before Jun 24, and then work with Andrij on the longer term solution.

On Thu, Jun 13, 2019 at 3:42 PM Shauna notifications@github.com wrote:

@llorracc https://github.com/llorracc, what do you need done by the 24th? Specifically, which notebooks do you need to be ready, and what are the ideal, and maximum acceptable, times for them to load?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/econ-ark/OverARK/issues/5?email_source=notifications&email_token=AAKCK72UXHXOPILRMHEIKI3P2JFDDA5CNFSM4G273X52YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODXTXJXA#issuecomment-501707996, or mute the thread https://github.com/notifications/unsubscribe-auth/AAKCK7Y6FL75GXZ3S4IOHELP2JFDDANCNFSM4G273X5Q .

--

llorracc commented 5 years ago

I've just cc'd you on a message to a startup that seems to have been created to solve several of the exact problems we have been struggling with (including this one). Will keep you in the loop if I get a response. Maybe our "trial" will be sufficient for my needs at present.

On Thu, Jun 13, 2019 at 5:41 PM Carroll, Christopher ccarroll@llorracc.org wrote:

The best test-case is the BufferStockTheory remark, partly because it has a boolean that determines whether to use the "full version" of LaTeX (which is the huge 2.3 gb thing) or the built-in slimmed down version. It would be easy to compare the two versions and use that to test the degree of speedup.

The degree of speedup needed is actually a more complicated question than it might seem, because of the different ways that the different tools work. For MyBinder, the whole virtual machine is built according to specs in the /binder file before anything is displayed. For Google CoLab, you have to include the packages you need in a "pip install" command in the first cell, so the notebook displays immediately but can't be used until it finishes building.

Let's wait until I get back to the US early next week to focus on this; it shouldn't take too long with both of us focused on it. Also, Andrij has been tasked with essentially the same mission by QuantEcon and so I want to piggyback on what they do. That changes my view that we should make a "big push" to come up with a long-term solution; instead, our objective should be to have a quick-and-dirty solution before Jun 24, and then work with Andrij on the longer term solution.

On Thu, Jun 13, 2019 at 3:42 PM Shauna notifications@github.com wrote:

@llorracc https://github.com/llorracc, what do you need done by the 24th? Specifically, which notebooks do you need to be ready, and what are the ideal, and maximum acceptable, times for them to load?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/econ-ark/OverARK/issues/5?email_source=notifications&email_token=AAKCK72UXHXOPILRMHEIKI3P2JFDDA5CNFSM4G273X52YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODXTXJXA#issuecomment-501707996, or mute the thread https://github.com/notifications/unsubscribe-auth/AAKCK7Y6FL75GXZ3S4IOHELP2JFDDANCNFSM4G273X5Q .

--

  • Chris Carroll

--