Closed billkle1n closed 6 years ago
Hi,
A practical answer: as a rule of thumb for a code of size c, first apply a OPQ transform (not PCA) to 4*c
or 8*c
, then encode with PQ.
opq = OPQMatrix(2048, 16, 4*16)
pq = ProductQuantizer(4*16, 16, 8)
opq.train(x)
xt = opq.apply_py(x)
pq.train(xt)
codes = pq.compute_codes(xt)
To decode:
x_decoded = opq.reverse_transform(pq.decode(codes))
Closing.
Sorry, this is not really an issue, more a theoretical question.
Let's say that I have a 2048D vector (for the sake of argument, let's assume that the vector is uint8 data type) and want to reduce that representation down to 16 bytes. On one end of the scale, I could apply PCA (or other dimensionality reduction technique) to get 16 dimensions. On the other end, I could use PQ directly on the 2048D vectors and obtain 16 byte codes.
There is probably a sweet spot between those two extremes (e.g. PCA to 128D and then PQ). That sweet spot could probably be found through parameter search and testing on a validation set.
What bothers me is I have zero intuition for what those parameters might be.