Feature Selection Experiments

ProjectSidewalk / sidewalk-quality-analysis

An analysis of Project Sidewalk user quality based on interaction logs

5 stars 3 forks source link

So here is an update on this issue. There are actually two recursive feature selections running, one for the label classifier and one for the user accuracy classifier.

These are the results of feature selection for the label classifier: Screenshot_2019-08-22 JupyterLab Optimal number of features : 12 Mask : [ True True True True True True False True True True True True True]

The features it's using are ['label_type', 'sv_image_y', 'canvas_x', 'canvas_y', 'heading', 'pitch', 'zoom', 'lat', 'lng', 'proximity_distance', 'proximity_middleness', 'CLASS_DESC', 'ZONEID']

Which means that the one feature that was eliminated was 'zoom'.

I will update this issue with the results of feature selection on the user accuracy classifier soon. We have a lot more accuracy features than label features.

ProjectSidewalk / sidewalk-quality-analysis

Feature Selection Experiments #46