In the paper, I see the sampling trick:
"we show that using the full model (all layers) in the first half of the sampling steps and only using the original layers (without
the gated Transformer layers) in the latter half can lead to generation results that accurately reflect the grounding conditions while also having high image quality."
But in the inference code, it seems this tirck is not implemented. Could you please tell the reason?
In the paper, I see the sampling trick: "we show that using the full model (all layers) in the first half of the sampling steps and only using the original layers (without the gated Transformer layers) in the latter half can lead to generation results that accurately reflect the grounding conditions while also having high image quality." But in the inference code, it seems this tirck is not implemented. Could you please tell the reason?