Open GoogleCodeExporter opened 9 years ago
Yeah, so far there are few enough developers that we commit into trunk with the
option of reverting changes if necessary. The diff tools on the hosting site
make it
relatively easy to see what's going on.
Original comment by dwidd...@gmail.com
on 9 Jul 2009 at 12:31
Cool committed in r302 if you hadn't spotted it.
Original comment by admac...@gmail.com
on 10 Jul 2009 at 2:18
I've added a test and changed the "equals" to "equalsIgnoreCase", in r303.
Original comment by dwidd...@gmail.com
on 10 Jul 2009 at 2:47
So you're saying that it's lowercasing the string array of query terms in
advance of
building the vectors? I can't see where this is happening.. At least for
CompareTermsBatch, negation definitely was occurring for r300, since that was
the
cause of the rounding errors that led to the patch for issue 13.
Eg using my working copy at r302, it's clear that 'red NOT green' gives a
different
vector to 'red green' and 'red not green' (which both give the same result) :
amack@tee ~$ java -Xmx512M pitt.search.semanticvectors.CompareTerms
-luceneindexpath
/lt/work/amack/working/tee/gpdc-tests/lucene-indexes/iliad_all/index
-queryvectorfile
/lt/work/amack/working/tee/gpdc-tests/lucene-indexes/iliad_all/termvectors.bin
"red
green" "blue"
Opening query vector store from file:
/lt/work/amack/working/tee/gpdc-tests/lucene-indexes/iliad_all/termvectors.bin
Outputting similarity of "red green" with "blue" ...
0.149131
amack@tee ~$ java -Xmx512M pitt.search.semanticvectors.CompareTerms
-luceneindexpath
/lt/work/amack/working/tee/gpdc-tests/lucene-indexes/iliad_all/index
-queryvectorfile
/lt/work/amack/working/tee/gpdc-tests/lucene-indexes/iliad_all/termvectors.bin
"red
NOT green" "blue"
Opening query vector store from file:
/lt/work/amack/working/tee/gpdc-tests/lucene-indexes/iliad_all/termvectors.bin
Numer of negative terms: 1
Numer of positive terms: 1
Outputting similarity of "red NOT green" with "blue" ...
0.059290256
amack@tee ~$ java -Xmx512M pitt.search.semanticvectors.CompareTerms
-luceneindexpath
/lt/work/amack/working/tee/gpdc-tests/lucene-indexes/iliad_all/index
-queryvectorfile
/lt/work/amack/working/tee/gpdc-tests/lucene-indexes/iliad_all/termvectors.bin
"red
not green" "blue"
Opening query vector store from file:
/lt/work/amack/working/tee/gpdc-tests/lucene-indexes/iliad_all/termvectors.bin
Didn't find vector for 'not'
No vector for not
Outputting similarity of "red not green" with "blue" ...
0.149131
Original comment by admac...@gmail.com
on 10 Jul 2009 at 11:44
In fact it looks like there's no case normalization at all. Should there be?:
amack@tee ~/projects/semanticvectors-svn+batch$ java -Xmx512M
pitt.search.semanticvectors.CompareTerms -luceneindexpath
/lt/work/amack/working/tee/gpdc-tests/lucene-indexes/iliad_all/index
-queryvectorfile
/lt/work/amack/working/tee/gpdc-tests/lucene-indexes/iliad_all/termvectors.bin
"RED
green" "blue"
Opening query vector store from file:
/lt/work/amack/working/tee/gpdc-tests/lucene-indexes/iliad_all/termvectors.bin
Didn't find vector for 'RED'
No vector for RED
Outputting similarity of "RED green" with "blue" ...
0.14490755
Original comment by admac...@gmail.com
on 10 Jul 2009 at 11:48
Strange ... there's a "matchcase" flag that you're supposed to switch on, e.g.,
I get
results for
java pitt.search.semanticvectors.Search peter
java pitt.search.semanticvectors.Search PeTeR
java pitt.search.semanticvectors.Search --matchcase peter
but no results for
java pitt.search.semanticvectors.Search --matchcase PeTeR
Is that flag working for you?
Original comment by dwidd...@gmail.com
on 11 Jul 2009 at 12:36
Ah, OK. I was only looking at CompareTerms(Batch). That works fine for me with
the
Search class, but the same flag does nothing for CompareTerms. Presumably it
would
make more sense for CompareTerms to have similar case normalisation behaviour?
Original comment by admac...@gmail.com
on 13 Jul 2009 at 2:38
Original issue reported on code.google.com by
admac...@gmail.com
on 9 Jul 2009 at 7:55