Few Doubt's in the model

Hi, I had a few doubts in model creation

Firstly is there any specific reason for having more residual modules before stacked hourglasses compared to Newell's Model Creation. (He only uses 3 residuals while hg-attention model uses 6 residuals). Do these extra residual modules provide better performance?
Secondly if I want to create a model with less stacks (say nStacks 2 would I need to make changes to the implementation since you have different attention mechanism for later stacks as in following lines if i>4 then att = AttentionPartsCRF(opt.nFeats, ll2, opt.LRNKer, 3, 0) tmpOut = AttentionPartsCRF(opt.nFeats, att, opt.LRNKer, 3, 1)

Thank You

bearpaw / pose-attention