You retrieve a batch of event documents here, and then try to optimize by matching identities. The batch may not contain all events of a particular identity because of the batch size restriction. A better approach would be is to get a batch of identities, lock them, and then load all the documents that were acquired. Then you'll have all documents of a particular identity at a given moment.
Considerations:
What do we initially retrieve? Can we query lightblue for events with a max number of unique identities? Does this perform better than just simply retrieving a large batch?
Is making an additional query on average worth the chance of finding more events you could optimize?
Per @bserder:
Considerations: