Pulsar's Global Presorting & Signature Calculation

I am currently trying to get a detailed understanding of the Pulsar implementation and would like to ask two questions.

I can't get my head around why there are 3 calls to the radix sort function in Pulsar's forward pass. I realise that all three "things" need to be sorted but can not understand why this can/should not be done in a single radix sort. Is it more efficient this way? And if so, why?
The calc_signature kernel ist called twice during a single forward+backward pass combination. Why is this calculation done twice instead of once (i.e. during forward pass)? Is this to make Pulsar less memory hungry at the cost of performance or is there something I am missing as to why this is necessary/not slower?

facebookresearch / pytorch3d