Question about the detail of training data generation

Thanks for your great job!

I notice the following description in your paper,

Specifically, we sample a short sequence of 7 frames from the training set of [54] and randomly crop the frames to generate the input unstable video. We then apply another random cropping on the center frame as the ground-truth of the target stabilized frame.

Could you provide more details about the training data generation procedure? To be specific,

What are the parameters for the random cropping, such as the mean/variance of the cropping region?
What does the term "center frame" refer to? Is it the 4-th frame of the 7-frame short sequence?
Why the ground-truth is generated by random cropping. Shouldn't the ground truth be determined?

I would appreciate it if you can provide more details.

alex04072000 / FuSta

Question about the detail of training data generation #19