clovaai / generative-evaluation-prdc

Code base for the precision, recall, density, and coverage metrics for generative models. ICML 2020.
MIT License
240 stars 27 forks source link

using exact similarity search #5

Open psteinb opened 3 years ago

psteinb commented 3 years ago

Thank you for providing the coding and implementation for density and coverage. It is awesome to have the code ready for use in practice. I cite the paper whenever possible.

I took the liberty to research possible improvements. I found that using an exact similarity search as offered by faiss can speed up the calculation of density and coverage by a great deal.

Here are my results for num_real_samples = num_fake_samples = 1024, feature_dim = 12, nearest_k = 5:

--------------------------------------------------------------------------------------- benchmark: 4 tests ---------------------------------------------------------------------------------------
Name (time in ms)                 Min                 Max                Mean             StdDev              Median                IQR            Outliers      OPS            Rounds  Iterations
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_bench_my_coverage        10.7225 (1.0)       53.4196 (1.09)      13.9665 (1.0)       8.7931 (1.0)       11.1186 (1.0)       1.4884 (1.0)           1;4  71.5998 (1.0)          24           1
test_bench_my_density         11.9193 (1.11)      49.0908 (1.0)       16.7892 (1.20)      9.0503 (1.03)      12.8918 (1.16)      3.9669 (2.67)          2;3  59.5619 (0.83)         18           1
test_bench_prdc_coverage     316.7985 (29.55)    400.5574 (8.16)     354.7417 (25.40)    31.7475 (3.61)     355.2325 (31.95)    43.6299 (29.31)         2;0   2.8190 (0.04)          5           1
test_bench_prdc_density      365.5958 (34.10)    400.6876 (8.16)     382.3611 (27.38)    12.6120 (1.43)     380.4541 (34.22)    12.9641 (8.71)          2;0   2.6153 (0.04)          5           1
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

I tagged, any algorithm using faiss with test_bench_my. Using a similarity tree approach, this line

 real_nearest_neighbour_distances = compute_nearest_neighbour_distances(
        real_features, nearest_k)

in the original code is accelerated big time due to the efficient lookup of samples with the tree structure.

As such a change would drag in a dependency to faiss, I am reluctant to send a PR to this repo. Let me know what you think!

coallaoh commented 3 years ago

Thank you for your interest in the paper and the awesome suggestion!

As you worry, though, I would be careful with changes that would require heavier dependencies and significantly longer installation time (not sure how much longer though). Moreover, I think a good evaluation code should not depend too much on external libraries, for the sake of constancy.

I understand that the current naive implementation of NN search is not fully efficient, but do you think it is an important bottleneck in your (or any potential) application scenario?