Make using groups other than `entity_id` in collate less error-prone

Mostly for discussion, but it seems like trying to use anything aside from entity_id when specifying groups in a feature aggregation (e.g., zipcode, etc) is currently pretty error-prone. For instance, if there isn't a 1-to-1 relationship between the entity_id and these other columns, you'll end up with multiple records in the matrix with the same (entity_id, as_of_date) key, which causes many problems downstream.

Thoughts on how to improve the functionality here? Some options:

Remove groups from the experiment config and always assume only entity_id?
Allow the user to specify non-entity_id groups, but make them somehow override a validation check to make it clear this is "advanced" functionality?
Check that matrices have no duplicate entity_id/as_of_date pairs and raise an early error if they do?
Other ideas?

dssg / triage

Make using groups other than `entity_id` in collate less error-prone #874