dssg / triage

General Purpose Risk Modeling and Prediction Toolkit for Policy and Social Good Problems
Other
185 stars 60 forks source link

Sort Default Feature Group Definition #833

Closed shaycrk closed 3 years ago

shaycrk commented 3 years ago

I believe this should fix the bug Evan identified in #832

However, one question before merging: is there any good reason for the default to not simply be:

feature_group_definition['all'] = [True]

The FeatureGroupCreator already has an all subsetter that we could take advantage of rather than recreating the logic here. However, if we do that, I wonder if we want to explicitly sort the group names in creating the matrix metadata here -- the main reason I didn't add that as well is because there might be an issue with backwards compatibility changing existing matrix hashses. That might not be a huge deal if we merge/tag at the same time as #830 however.

thcrock commented 3 years ago

I agree with the points you made here; using the existing 'all' preset is probably the way to go, and sorting the names when we create the matrix metadata. Since we are already doing 830 to break old hashes, might as well do this change as well before tagging a new version.

shaycrk commented 3 years ago

Ok -- updated to use the all subsetter by default instead and sorting the feature group names when creating the matrix metadata. I think this should be all set, but we should probably coordinate merging with #830 (and probably #835) as well as bumping the version.