Closed shankari closed 4 years ago
As a concrete example of the differences, repo2docker
uses miniforge
instead of miniconda
and uses the conda-forge
repo.
https://github.com/jupyter/repo2docker/tree/master/repo2docker/buildpacks/conda
But conda-forge
can make the installation super-slow.
https://github.com/conda/conda/issues/7239
I intentionally included only the main channel for the manual install.
The CI is enabled in the enable_basic_ci
branch. I'm still getting the CI for repo2docker
to work perfectly. Tomorrow, I will merge those changes, and get basic testing on the notebooks to work again.
Merged CI for both manual install and repo2docker https://github.com/MobilityNet/mobilitynet-analysis-scripts/pull/40
Can people please try out the scripts and give me feedback?
There are some fixes I forgot that I made a PR for here: https://github.com/MobilityNet/mobilitynet-analysis-scripts/pull/38
setup.sh
worked perfectly on Ubuntu 18.10
Well, the CI indicates that it would. But you had to download the code and run the setup. The requested feedback was on whether we should support manual install or running in binder.
Also, I am actively working on CI that includes tests. Just need to get the repo2docker stuff to work and then I can merge. https://github.com/MobilityNet/mobilitynet-analysis-scripts/pull/42
Right now, my goal for the day is to ensure that all the timeline*.ipynb work properly and are intelligible, since I assume that people will start with explorations. I still need to expand the ground truth spec to handle small reroutes for the Berkeley case.
You can help most with the "intelligible" part since everything is obvious to me 😄
I'll make changes to that PR regarding making stuff intelligible today...
To be honest I think it's easiest to run locally. Most people will have macs/linux so the setup.sh
will take care of that easily. And Windows people just need to manually install the conda environment.
Binder is great but annoying to work with over longer periods of time since it will randomly die in my experience.
I think comments on what some of the basic functions are doing would be helpful. For example in evaluate_power_vs_classification.ipynb
in the cells after "The Views" and before "Add in other entries to the dataframe to allow us to plot better", it's hard to see what some of the cells are doing. Comments for cells with large new functions would help a lot without having to go through the code, and comments for cells that contain lines like ems.fill_sensed_section_ranges(pv_la)
would also be helpful.
@jesbu1 as I said, I think that we should initially focus on timeline*.ipynb
since I assume people with start with interactive explorations. Any thoughts on those?
I'm still having troubles with the maps displaying on timeline_car_scooter_brex_san_jose.ipynb
, can you try to reproduce?
by "map displaying" you mean that there is no python error, but the map tiles are not loading?
One thing is this tags like this: HAHFDC v/s MAHFDC:HAHFDC_0' are unexplained. I know they refer to the accuracy, but perhaps having an example in the Data_exploration_template
and then explaining what they mean there would make it easier to parse the other timelines.
Knowing exactly what they mean would also make parsing the graphs easier at the bottom of the timelines.
by "map displaying" you mean that there is no python error, but the map tiles are not loading?
Yes, the map tiles are not loading in that notebook for me, they work fine on others.
A couple of other problems: in trajectory_evaluation.ipynb
, code cell 21, there's an array length mismatch error. I have isolated it to the function call: get_spatial_errors(pv_la)
In trajectory_evaluation_spatio_temporal.ipynb
there's a Keyerror in code cell 16.
st_errors_df.role
has an entry, MAMFDC
that is not a key in r2q_map
.
These are the only 2 code problems I found
In trajectory_evaluation_spatio_temporal.ipynb there's a Keyerror in code cell 16. st_errors_df.role has an entry, MAMFDC that is not a key in r2q_map.
This is because I collected new data after I made the notebooks.
These are the only 2 code problems I found
Great! However, I think that it would be better for us to focus on a small set of notebooks and make sure that they are clear. Do you think we should try to work on all notebooks as the same time? I am afraid we don't have enough time to document them well enough.
Then let's just make sure the timeline ones are documented well enough. We could talk about how to do this in tomorrow's video call.
What should we do with the notebooks that are not well documented? Move them out somewhere? Or just highlight that people should start with the timelines? OR ???
Oh good question.... maybe just highlight that people start with the timelines, and say that we're working on documenting the others well?
FYI, CI for the timeline.ipynb is working and on the README page.
EU folks, if you have time today to look at the DataGenerator
and the other `timeline` and:
One thing is this tags like this: HAHFDC v/s MAHFDC:HAHFDC_0' are unexplained. I know they refer to the accuracy, but perhaps having an example in the Data_exploration_template and then explaining what they mean there would make it easier to parse the other timelines.
Can somebody else (@jesbu1 @jf87 @lefterav) handle this? I have to fix the reroute spec and it is hard for me to document this since I understand it anyway.
Can you explain here what the tags mean?
can you see section 7.2.2.1 "Built-in, black-box sensing parameters" of my thesis?
timeline_car_scooter_brex_san_jose.ipynb works fine for me [I've installed the environment using conda]
I tried Binder and also works fine, only problem is that it takes very long to setup (several minutes).
so the python code just creates the maps. After that, the map tiles are loaded by the browser and our code has no involvement with it. I suspect @jesbu1 may be running into rate limits on the tile loading, since we are using a free tile provider.
I tried Binder and also works fine, only problem is that it takes very long to setup (several minutes).
Yeah, I put this in the documentation. Because binder uses conda-forge
, it takes twice the time as a regular conda install. You can see this from the CI.
@shankari added explanations for accuracy/frequency in the Data exploration template in the PR here https://github.com/MobilityNet/mobilitynet-analysis-scripts/pull/38
Updated my earlier PR to have the text in the san jose timeline, and added a PR here to update the website quickstart: https://github.com/MobilityNet/mobilitynet.github.io/pull/10
For now, we are supporting both since there is CI for both. May revisit that decision in a few months.
I'm spending a bunch of time setting up CI so that we know that everything works and people don't get weird installation errors.
The notebooks can currently be run in two different situations:
emissioneval
conda environment, andOne challenge is keeping the two environments synchronized. Ideally, to avoid weird behavior, we would attach versions to all components of the environment and make them identical. But that is hard to do when we use repo2docker since they install conda automatically into their own docker container and they can change their version and their channels without iteracting with us.
It seems like there are four main ways in which we can deal with this:
Obviously (3) is most flexible, but is also the most maintenance work for us, and I would like to push strongly for either (1) or (2).
Feel free to try out the approaches right now.
@jesbu1 @lefterav @jf87 What do other datasets do? And what do you think we should do?