Closed mikemccand closed 10 months ago
I think that we could simply add an resourceDescription
field to the AbstractKnnVectorQuery
and modify the toString in the implementations so that the output would look something like examples:
resourceDescription = "publisher backstory"
TASK: cat=VectorSearch q=KnnFloatVectorQuery:vector:publisher backstory[0.02625591,...][100] s=null group=null hits=100 facets=[]
resourceDescription = ""
TASK: cat=VectorSearch q=KnnFloatVectorQuery:vector:[0.02625591,...][100] s=null group=null hits=100 facets=[]
resourceDescription = null
TASK: cat=VectorSearch q=KnnFloatVectorQuery:vector:null[0.02625591,...][100] s=null group=null hits=100 facets=[]
Would this allow us to move forward with the benchmarker fix in https://github.com/mikemccand/luceneutil/issues/226 ?
Description
Over in https://github.com/mikemccand/luceneutil/issues/226 while trying to fix a sneaky and long-standing Lucene nightly benchmark non-determinism that affected
VectorSearch
and some*TaxoFacets
performance measures, I struggled and failed/cheated to pick whichVectorSearch
queries to keep for disambiguation.The tasks file has:
The benchy then computes embeddings from each of these lexical terms, and creates
KnnFloatVectorQuery
for each.But then later, if something goes wrong, the
toString
of these queries just renders the first dimension float:I realize from the machine's standpoint it really is only this vector that "matters", but we humans still think in terms of words (so far, anyways, heh). Could we maybe allow for an optional opaque and not counting towards
hashCode
/equals
/etc. string that is then regurgitated back out intoString
to help we humans that still need to interact with the machines?If we had this, I could have made the correct fix over in https://github.com/mikemccand/luceneutil/issues/226 to try to gain back some continuity in the vector nightly charts. But instead I just picked the top 5 vector queries, which is most likely wrong. Also, there is precedent in Lucene for such "opaque for-human strings": the
String resourceDescription
passed to baseIndexInput
constructor.