Supporting both local and binder environments

shankari commented 4 years ago

I'm spending a bunch of time setting up CI so that we know that everything works and people don't get weird installation errors.

The notebooks can currently be run in two different situations:

locally by checking out the repo and setting up the emissioneval conda environment, and
through binder, which requires no additional setup

One challenge is keeping the two environments synchronized. Ideally, to avoid weird behavior, we would attach versions to all components of the environment and make them identical. But that is hard to do when we use repo2docker since they install conda automatically into their own docker container and they can change their version and their channels without iteracting with us.

It seems like there are four main ways in which we can deal with this:

remove support for local install = use only repo2docker. [Publish the resulting docker image](e.g. https://github.com/jupyter/repo2docker/blob/9a7dae9b44d25564e0e1382c3e9b08fc1016d310/docs/source/howto/deploy.rst) on dockerhub, so people can either run it locally using docker, or remotely using binder
remove support for binder Tell people that they have to run it locally
have CI for both

Obviously (3) is most flexible, but is also the most maintenance work for us, and I would like to push strongly for either (1) or (2).

Feel free to try out the approaches right now.

@jesbu1 @lefterav @jf87 What do other datasets do? And what do you think we should do?

shankari commented 4 years ago

As a concrete example of the differences, repo2docker uses miniforge instead of miniconda and uses the conda-forge repo. https://github.com/jupyter/repo2docker/tree/master/repo2docker/buildpacks/conda

But conda-forge can make the installation super-slow. https://github.com/conda/conda/issues/7239

I intentionally included only the main channel for the manual install.

shankari commented 4 years ago

The CI is enabled in the enable_basic_ci branch. I'm still getting the CI for repo2docker to work perfectly. Tomorrow, I will merge those changes, and get basic testing on the notebooks to work again.

shankari commented 4 years ago

Merged CI for both manual install and repo2docker https://github.com/MobilityNet/mobilitynet-analysis-scripts/pull/40

Can people please try out the scripts and give me feedback?

jesbu1 commented 4 years ago

There are some fixes I forgot that I made a PR for here: https://github.com/MobilityNet/mobilitynet-analysis-scripts/pull/38

jesbu1 commented 4 years ago

setup.sh worked perfectly on Ubuntu 18.10

shankari commented 4 years ago

Well, the CI indicates that it would. But you had to download the code and run the setup. The requested feedback was on whether we should support manual install or running in binder.

shankari commented 4 years ago

Also, I am actively working on CI that includes tests. Just need to get the repo2docker stuff to work and then I can merge. https://github.com/MobilityNet/mobilitynet-analysis-scripts/pull/42

shankari commented 4 years ago

Right now, my goal for the day is to ensure that all the timeline*.ipynb work properly and are intelligible, since I assume that people will start with explorations. I still need to expand the ground truth spec to handle small reroutes for the Berkeley case.

You can help most with the "intelligible" part since everything is obvious to me 😄

jesbu1 commented 4 years ago

I'll make changes to that PR regarding making stuff intelligible today...

To be honest I think it's easiest to run locally. Most people will have macs/linux so the setup.sh will take care of that easily. And Windows people just need to manually install the conda environment.

Binder is great but annoying to work with over longer periods of time since it will randomly die in my experience.

jesbu1 commented 4 years ago

I think comments on what some of the basic functions are doing would be helpful. For example in evaluate_power_vs_classification.ipynb in the cells after "The Views" and before "Add in other entries to the dataframe to allow us to plot better", it's hard to see what some of the cells are doing. Comments for cells with large new functions would help a lot without having to go through the code, and comments for cells that contain lines like ems.fill_sensed_section_ranges(pv_la) would also be helpful.

shankari commented 4 years ago

@jesbu1 as I said, I think that we should initially focus on timeline*.ipynb since I assume people with start with interactive explorations. Any thoughts on those?

jesbu1 commented 4 years ago

I'm still having troubles with the maps displaying on timeline_car_scooter_brex_san_jose.ipynb, can you try to reproduce?

shankari commented 4 years ago

by "map displaying" you mean that there is no python error, but the map tiles are not loading?

jesbu1 commented 4 years ago

One thing is this tags like this: HAHFDC v/s MAHFDC:HAHFDC_0' are unexplained. I know they refer to the accuracy, but perhaps having an example in the Data_exploration_template and then explaining what they mean there would make it easier to parse the other timelines.

Knowing exactly what they mean would also make parsing the graphs easier at the bottom of the timelines.

jesbu1 commented 4 years ago

by "map displaying" you mean that there is no python error, but the map tiles are not loading?

Yes, the map tiles are not loading in that notebook for me, they work fine on others.

jesbu1 commented 4 years ago

A couple of other problems: in trajectory_evaluation.ipynb, code cell 21, there's an array length mismatch error. I have isolated it to the function call: get_spatial_errors(pv_la)

jesbu1 commented 4 years ago

In trajectory_evaluation_spatio_temporal.ipynb there's a Keyerror in code cell 16. st_errors_df.role has an entry, MAMFDC that is not a key in r2q_map.

These are the only 2 code problems I found

shankari commented 4 years ago

In trajectory_evaluation_spatio_temporal.ipynb there's a Keyerror in code cell 16. st_errors_df.role has an entry, MAMFDC that is not a key in r2q_map.

This is because I collected new data after I made the notebooks.

These are the only 2 code problems I found

Great! However, I think that it would be better for us to focus on a small set of notebooks and make sure that they are clear. Do you think we should try to work on all notebooks as the same time? I am afraid we don't have enough time to document them well enough.

jesbu1 commented 4 years ago

Then let's just make sure the timeline ones are documented well enough. We could talk about how to do this in tomorrow's video call.

shankari commented 4 years ago

What should we do with the notebooks that are not well documented? Move them out somewhere? Or just highlight that people should start with the timelines? OR ???

jesbu1 commented 4 years ago

Oh good question.... maybe just highlight that people start with the timelines, and say that we're working on documenting the others well?

shankari commented 4 years ago

FYI, CI for the timeline.ipynb is working and on the README page. EU folks, if you have time today to look at the DataGenerator and the other `timeline` and:

clarify things that are not completely clear, and
come up with a list of behavior that you can't figure out even after trying that would be great

shankari commented 4 years ago

One thing is this tags like this: HAHFDC v/s MAHFDC:HAHFDC_0' are unexplained. I know they refer to the accuracy, but perhaps having an example in the Data_exploration_template and then explaining what they mean there would make it easier to parse the other timelines.

Can somebody else (@jesbu1 @jf87 @lefterav) handle this? I have to fix the reroute spec and it is hard for me to document this since I understand it anyway.

jesbu1 commented 4 years ago

Can you explain here what the tags mean?

shankari commented 4 years ago

can you see section 7.2.2.1 "Built-in, black-box sensing parameters" of my thesis?

maufadel commented 4 years ago

timeline_car_scooter_brex_san_jose.ipynb works fine for me [I've installed the environment using conda]

jf87 commented 4 years ago

I tried Binder and also works fine, only problem is that it takes very long to setup (several minutes).

shankari commented 4 years ago

so the python code just creates the maps. After that, the map tiles are loaded by the browser and our code has no involvement with it. I suspect @jesbu1 may be running into rate limits on the tile loading, since we are using a free tile provider.

shankari commented 4 years ago

I tried Binder and also works fine, only problem is that it takes very long to setup (several minutes).

Yeah, I put this in the documentation. Because binder uses conda-forge, it takes twice the time as a regular conda install. You can see this from the CI.

jesbu1 commented 4 years ago

@shankari added explanations for accuracy/frequency in the Data exploration template in the PR here https://github.com/MobilityNet/mobilitynet-analysis-scripts/pull/38

jesbu1 commented 4 years ago

Updated my earlier PR to have the text in the san jose timeline, and added a PR here to update the website quickstart: https://github.com/MobilityNet/mobilitynet.github.io/pull/10

shankari commented 4 years ago

For now, we are supporting both since there is CI for both. May revisit that decision in a few months.

MobilityNet / mobilitynet.github.io

Supporting both local and binder environments #3