Hello, I'm training the code for dinov2-distill, but in distill_meta_arch.py
for ms, mss in zip(get_fsdp_modules(self.student[k]), get_fsdp_modules(self.student_shadow[k])) shows that the value of mss is empty, and I found that the studentshadow is a deep copy of student, but the network architecture of the two is slightly different student shadow is not an FSDP type model, so there is a logical error with the get_fsdp_modules function used here, so that I can't successfully train the network, is that so?
As shown in the figure above, student and student_shadow are not the same
Hello, I'm training the code for dinov2-distill, but in distill_meta_arch.py for ms, mss in zip(get_fsdp_modules(self.student[k]), get_fsdp_modules(self.student_shadow[k])) shows that the value of mss is empty, and I found that the studentshadow is a deep copy of student, but the network architecture of the two is slightly different student shadow is not an FSDP type model, so there is a logical error with the get_fsdp_modules function used here, so that I can't successfully train the network, is that so?
As shown in the figure above, student and student_shadow are not the same