Open singhish opened 3 years ago
It is pretty clear that the error is because of the different lengths that we are trying to combine.
---> 36 curr_spatial_error_df = gpd.GeoDataFrame({"error": meter_dist,
37 "ts": section_geo_df.ts,
38 "geometry": section_geo_df.geometry,
One of those vectors has length 180 and the other has length 259
I'm guessing that this is also due to the new MAMFDC entry, although I am not 100% sure why
@shankari The new MAMFDC doesn't get picked up until a few cells after. I'll try and deep dive a little further and see whether I can resolve it on my own. Will keep you posted.
@shankari After digging deeper, it looks like the data frames meter_dist
and section_geo_df
in the function get_spatial_errors
are of different lengths. Will try to address this deeper in the morning, but will also check if the other trajectory notebooks work as well so I can grab box plots.
Any idea as to why that might be the case?
it looks like the data frames meter_dist and section_geo_df in the function get_spatial_errors are of different lengths.
Right, that's what I said in https://github.com/MobilityNet/mobilitynet.github.io/issues/19#issuecomment-785432716
Any idea as to why that might be the case?
I thought this may be because of MAMFDC, but it wasn't.
@singhish The actual error is because we were using the filtered values (outside the start/end polygons) for all the other columns but not for the geometry
and ts
columns. I am not sure how this ever worked - maybe this is part of the lost code in the reset.
At any rate, the fix is:
@@ -364,9 +364,11 @@
" filtered_gt_linestring = emd.filter_ground_truth_linestring(utm_section_gt_shapes)\n",
" meter_dist = filtered_us_gpdf.geometry.distance(filtered_gt_linestring)\n",
" ne = len(meter_dist)\n",
+ " \n",
+ " filtered_section_geo_df = section_geo_df.loc[filtered_us_gpdf.index]\n",
" curr_spatial_error_df = gpd.GeoDataFrame({\"error\": meter_dist,\n",
- " \"ts\": section_geo_df.ts,\n",
- " \"geometry\": section_geo_df.geometry,\n",
+ " \"ts\": filtered_section_geo_df.ts,\n",
+ " \"geometry\": filtered_section_geo_df.geometry,\n",
" \"phone_os\": np.repeat(phone_os, ne),\n",
" \"phone_label\": np.repeat(phone_label, ne),\n",
" \"role\": np.repeat(r[\"eval_role_base\"], ne),\n",
Re-ran it for one section and it worked, rerunning for the entire set of timelines before submitting a PR.
visual representation of the unfiltered v/s filtered points, for the record. This is most visible on the left bottom of the map, near the Los Altos Library
This is helpful. Thank you!
Also, the trajectory_evaluation_spatio_temporal
notebook ended up working properly, would you prefer we use the box plots from this notebook or the one relevant to this issue?
you mean, should we use trajectory_evaluation.ipynb
or trajectory_evaluation_spatio_temporal.ipynb
the spatial is easier to understand
the spatio-temporal is more complicated
Which do you think?
There are some additional errors trajectory_evaluation.ipynb
in the error checks at the end.
I am taking a look at those now so the entire notebook works.
I double-checked all the versions of this notebook and the code is identical. The other notebooks (e.g. trajectory_evaluation_analysis_master.ipynb
) even have the outputs saved which proved that it did work.
But the error is straightforward. So how did this ever work?
the spatial is easier to understand
We can go with this one then!
there are some great examples of the spatial error at the end of the notebook that you can use.
Added these box plots to the presentation for now, as they give a good understanding of outliers vs what's expected in my opinion
I also added the map of the express_bus route over a single run
to give a real-world visual of what's going on. will pick this up in the morning
you may want to include only the car_scooter_brex
part of that since the others don't have all the sensing regimes.
I have resolved one of the errors in this file. It was also due to additional data that ended up with two HAHFDC entries. Adding the run to select further resolved the issue.
The second error is because the UCB spec has not been updated with the rerouted ground truth, so we don't retrieve the runs with quality == 4 at this point.
spatial_errors_df.query("phone_os == 'android' & (quality == 4) & section_id == 'light_rail_below_above_ground'").boxplot(column="error", by="run")
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-50-f76420bd7a58> in <module>
----> 1 spatial_errors_df.query("phone_os == 'android' & (quality == 4) & section_id == 'light_rail_below_above_ground'").boxplot(column="error", by="run")
ValueError: not enough values to unpack (expected 2, got 0)
When I reach the cell:
I get the following output:
Any ideas?