SCIInstitute / ShapeWorks

ShapeWorks
http://sciinstitute.github.io/ShapeWorks/
Other
104 stars 32 forks source link

Clarify runDataAugmentation() params #2072

Closed jadie1 closed 9 months ago

jadie1 commented 1 year ago

There are two ways to call runDataAugmentation(). The first is:

DataAugmentationUtils.runDataAugmentation(out_dir, img_list, 
                                          world_point_list, num_samples, 
                                          num_dim, percent_variability, 
                                          sampler_type, mixture_num)

This generates image/particle pairs in the world coordinate system and assumes the images in img_list are groomed/aligned.

The second is:

DataAugmentationUtils.runDataAugmentation(out_dir, img_list, 
                                          local_point_list, num_samples, 
                                          num_dim, percent_variability, 
                                          sampler_type, mixture_num,
                                          world_point_list)

This generates image/particle pairs in the local coordinate system and assumes the images in img_list are the original/unaligned images. The world_point_list needs to be provided in this case so that PCA is done in the world coordinate system. New samples are generated by sampling the world PCA subspace, then mapping it to local points using the transform from world to local of the closest real example. In the future, we could add noise to this transform as an additional form of augmentation, but right now, this is not included.

Currently, in runDataAugmentation() the third parameter is called local_point_list. This is confusing, given the world points are passed in the first case. We should fix this and clarify these two cases in the documentation. Alternatively, we could break this into two separate functions runWorldDataAugmentation() and runLocalDataAugmentation().