MobilityNet / mobilitynet.github.io

BSD 3-Clause "New" or "Revised" License
0 stars 3 forks source link

trajectory_evaluation.ipynb breaks #19

Open singhish opened 3 years ago

singhish commented 3 years ago

When I reach the cell:

spatial_errors_df = pd.DataFrame()
spatial_errors_df = pd.concat([spatial_errors_df, get_spatial_errors(pv_la)], axis="index")
spatial_errors_df = pd.concat([spatial_errors_df, get_spatial_errors(pv_sj)], axis="index")
spatial_errors_df = pd.concat([spatial_errors_df, get_spatial_errors(pv_ucb)], axis="index")

I get the following output:

No ground truth route for suburb_city_driving_weekend walk_start, must be polygon, skipping...
Processing travel leg android, ucb-sdb-android-1, accuracy_control, suburb_city_driving_weekend, suburb_city_driving_weekend
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-21-8958a15239da> in <module>
      1 spatial_errors_df = pd.DataFrame()
----> 2 spatial_errors_df = pd.concat([spatial_errors_df, get_spatial_errors(pv_la)], axis="index")
      3 spatial_errors_df = pd.concat([spatial_errors_df, get_spatial_errors(pv_sj)], axis="index")
      4 spatial_errors_df = pd.concat([spatial_errors_df, get_spatial_errors(pv_ucb)], axis="index")

<ipython-input-20-26623ce778ff> in get_spatial_errors(pv)
     34                         meter_dist = filtered_us_gpdf.geometry.distance(filtered_gt_linestring)
     35                         ne = len(meter_dist)
---> 36                         curr_spatial_error_df = gpd.GeoDataFrame({"error": meter_dist,
     37                                                                   "ts": section_geo_df.ts,
     38                                                                   "geometry": section_geo_df.geometry,

~/miniconda3/envs/emissioneval/lib/python3.8/site-packages/geopandas/geodataframe.py in __init__(self, *args, **kwargs)
     59         crs = kwargs.pop("crs", None)
     60         geometry = kwargs.pop("geometry", None)
---> 61         super(GeoDataFrame, self).__init__(*args, **kwargs)
     62 
     63         # need to set this before calling self['geometry'], because

~/miniconda3/envs/emissioneval/lib/python3.8/site-packages/pandas/core/frame.py in __init__(self, data, index, columns, dtype, copy)
    433             )
    434         elif isinstance(data, dict):
--> 435             mgr = init_dict(data, index, columns, dtype=dtype)
    436         elif isinstance(data, ma.MaskedArray):
    437             import numpy.ma.mrecords as mrecords

~/miniconda3/envs/emissioneval/lib/python3.8/site-packages/pandas/core/internals/construction.py in init_dict(data, index, columns, dtype)
    252             arr if not is_datetime64tz_dtype(arr) else arr.copy() for arr in arrays
    253         ]
--> 254     return arrays_to_mgr(arrays, data_names, index, columns, dtype=dtype)
    255 
    256 

~/miniconda3/envs/emissioneval/lib/python3.8/site-packages/pandas/core/internals/construction.py in arrays_to_mgr(arrays, arr_names, index, columns, dtype)
     62     # figure out the index, if necessary
     63     if index is None:
---> 64         index = extract_index(arrays)
     65     else:
     66         index = ensure_index(index)

~/miniconda3/envs/emissioneval/lib/python3.8/site-packages/pandas/core/internals/construction.py in extract_index(data)
    376                         f"length {len(index)}"
    377                     )
--> 378                     raise ValueError(msg)
    379             else:
    380                 index = ibase.default_index(lengths[0])

ValueError: array length 180 does not match index length 259

Any ideas?

shankari commented 3 years ago

It is pretty clear that the error is because of the different lengths that we are trying to combine.

---> 36                         curr_spatial_error_df = gpd.GeoDataFrame({"error": meter_dist,
     37                                                                   "ts": section_geo_df.ts,
     38                                                                   "geometry": section_geo_df.geometry,

One of those vectors has length 180 and the other has length 259

shankari commented 3 years ago

I'm guessing that this is also due to the new MAMFDC entry, although I am not 100% sure why

singhish commented 3 years ago

@shankari The new MAMFDC doesn't get picked up until a few cells after. I'll try and deep dive a little further and see whether I can resolve it on my own. Will keep you posted.

singhish commented 3 years ago

@shankari After digging deeper, it looks like the data frames meter_dist and section_geo_df in the function get_spatial_errors are of different lengths. Will try to address this deeper in the morning, but will also check if the other trajectory notebooks work as well so I can grab box plots.

Any idea as to why that might be the case?

shankari commented 3 years ago

it looks like the data frames meter_dist and section_geo_df in the function get_spatial_errors are of different lengths.

Right, that's what I said in https://github.com/MobilityNet/mobilitynet.github.io/issues/19#issuecomment-785432716

Any idea as to why that might be the case?

I thought this may be because of MAMFDC, but it wasn't.

shankari commented 3 years ago

@singhish The actual error is because we were using the filtered values (outside the start/end polygons) for all the other columns but not for the geometry and ts columns. I am not sure how this ever worked - maybe this is part of the lost code in the reset.

At any rate, the fix is:

@@ -364,9 +364,11 @@
     "                        filtered_gt_linestring = emd.filter_ground_truth_linestring(utm_section_gt_shapes)\n",
     "                        meter_dist = filtered_us_gpdf.geometry.distance(filtered_gt_linestring)\n",
     "                        ne = len(meter_dist)\n",
+    "                        \n",
+    "                        filtered_section_geo_df = section_geo_df.loc[filtered_us_gpdf.index]\n",
     "                        curr_spatial_error_df = gpd.GeoDataFrame({\"error\": meter_dist,\n",
-    "                                                                  \"ts\": section_geo_df.ts,\n",
-    "                                                                  \"geometry\": section_geo_df.geometry,\n",
+    "                                                                  \"ts\": filtered_section_geo_df.ts,\n",
+    "                                                                  \"geometry\": filtered_section_geo_df.geometry,\n",
     "                                                                  \"phone_os\": np.repeat(phone_os, ne),\n",
     "                                                                  \"phone_label\": np.repeat(phone_label, ne),\n",
     "                                                                  \"role\": np.repeat(r[\"eval_role_base\"], ne),\n",

Re-ran it for one section and it worked, rerunning for the entire set of timelines before submitting a PR.

shankari commented 3 years ago

visual representation of the unfiltered v/s filtered points, for the record. This is most visible on the left bottom of the map, near the Los Altos Library

Screen Shot 2021-02-24 at 3 40 00 PM
singhish commented 3 years ago

This is helpful. Thank you!

Also, the trajectory_evaluation_spatio_temporal notebook ended up working properly, would you prefer we use the box plots from this notebook or the one relevant to this issue?

shankari commented 3 years ago

you mean, should we use trajectory_evaluation.ipynb or trajectory_evaluation_spatio_temporal.ipynb the spatial is easier to understand the spatio-temporal is more complicated

Which do you think?

shankari commented 3 years ago

There are some additional errors trajectory_evaluation.ipynb in the error checks at the end. I am taking a look at those now so the entire notebook works.

shankari commented 3 years ago

I double-checked all the versions of this notebook and the code is identical. The other notebooks (e.g. trajectory_evaluation_analysis_master.ipynb) even have the outputs saved which proved that it did work.

But the error is straightforward. So how did this ever work?

singhish commented 3 years ago

the spatial is easier to understand

We can go with this one then!

shankari commented 3 years ago

there are some great examples of the spatial error at the end of the notebook that you can use.

singhish commented 3 years ago

Added these box plots to the presentation for now, as they give a good understanding of outliers vs what's expected in my opinion

trajectory

I also added the map of the express_bus route over a single run

Screen Shot 2021-02-24 at 7 11 11 PM

to give a real-world visual of what's going on. will pick this up in the morning

shankari commented 3 years ago

you may want to include only the car_scooter_brex part of that since the others don't have all the sensing regimes.

shankari commented 3 years ago

I have resolved one of the errors in this file. It was also due to additional data that ended up with two HAHFDC entries. Adding the run to select further resolved the issue.

The second error is because the UCB spec has not been updated with the rerouted ground truth, so we don't retrieve the runs with quality == 4 at this point.

spatial_errors_df.query("phone_os == 'android' & (quality == 4) & section_id == 'light_rail_below_above_ground'").boxplot(column="error", by="run")
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-50-f76420bd7a58> in <module>
----> 1 spatial_errors_df.query("phone_os == 'android' & (quality == 4) & section_id == 'light_rail_below_above_ground'").boxplot(column="error", by="run")

ValueError: not enough values to unpack (expected 2, got 0)