Hi, I'm quite interested in the paper. I want to know more about format of data to train the model from scratch. Do images need to be different views of the same person from different cameras, or just random 2D images with no info about yaw/pitch are enough? thank you
Hi, I'm quite interested in the paper. I want to know more about format of data to train the model from scratch. Do images need to be different views of the same person from different cameras, or just random 2D images with no info about yaw/pitch are enough? thank you