Peratham / semanticvectors

Automatically exported from code.google.com/p/semanticvectors
Other
0 stars 0 forks source link

suppressions of vector negation #16

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
Just added a command line flag '-suppressnegatedqueries', which if present
stops parsing of queries for 'NOT' (as Dominic said would be a good idea in
issue 13). Semantically it's an unfortunate number of negatives but it
preserve backwards compatibility. 

It seems there's no branching policy - I should just commit the changes
into trunk, and you'll look at them there?

Original issue reported on code.google.com by admac...@gmail.com on 9 Jul 2009 at 7:55

GoogleCodeExporter commented 9 years ago
Yeah, so far there are few enough developers that we commit into trunk with the
option of reverting changes if necessary. The diff tools on the hosting site 
make it
relatively easy to see what's going on. 

Original comment by dwidd...@gmail.com on 9 Jul 2009 at 12:31

GoogleCodeExporter commented 9 years ago
Cool committed in r302 if you hadn't spotted it.

Original comment by admac...@gmail.com on 10 Jul 2009 at 2:18

GoogleCodeExporter commented 9 years ago
I've added a test and changed the "equals" to "equalsIgnoreCase", in r303.

Original comment by dwidd...@gmail.com on 10 Jul 2009 at 2:47

GoogleCodeExporter commented 9 years ago
So you're saying that it's lowercasing the string array of query terms in 
advance of
building the vectors? I can't see where this is happening.. At least for
CompareTermsBatch, negation definitely was occurring for r300, since that was 
the
cause of the rounding errors that led to the patch for issue 13.

Eg using my working copy at r302, it's clear that 'red NOT green' gives a 
different
vector to 'red green' and 'red not green' (which both give the same result) :

amack@tee ~$  java -Xmx512M pitt.search.semanticvectors.CompareTerms 
-luceneindexpath
/lt/work/amack/working/tee/gpdc-tests/lucene-indexes/iliad_all/index 
-queryvectorfile
/lt/work/amack/working/tee/gpdc-tests/lucene-indexes/iliad_all/termvectors.bin 
"red
green" "blue"
Opening query vector store from file:
/lt/work/amack/working/tee/gpdc-tests/lucene-indexes/iliad_all/termvectors.bin
Outputting similarity of "red green" with "blue" ...
0.149131
amack@tee ~$  java -Xmx512M pitt.search.semanticvectors.CompareTerms 
-luceneindexpath
/lt/work/amack/working/tee/gpdc-tests/lucene-indexes/iliad_all/index 
-queryvectorfile
/lt/work/amack/working/tee/gpdc-tests/lucene-indexes/iliad_all/termvectors.bin 
"red
NOT green" "blue"
Opening query vector store from file:
/lt/work/amack/working/tee/gpdc-tests/lucene-indexes/iliad_all/termvectors.bin
Numer of negative terms: 1
Numer of positive terms: 1
Outputting similarity of "red NOT green" with "blue" ...
0.059290256
amack@tee ~$  java -Xmx512M pitt.search.semanticvectors.CompareTerms 
-luceneindexpath
/lt/work/amack/working/tee/gpdc-tests/lucene-indexes/iliad_all/index 
-queryvectorfile
/lt/work/amack/working/tee/gpdc-tests/lucene-indexes/iliad_all/termvectors.bin 
"red
not green" "blue"
Opening query vector store from file:
/lt/work/amack/working/tee/gpdc-tests/lucene-indexes/iliad_all/termvectors.bin
Didn't find vector for 'not'
No vector for not
Outputting similarity of "red not green" with "blue" ...
0.149131

Original comment by admac...@gmail.com on 10 Jul 2009 at 11:44

GoogleCodeExporter commented 9 years ago
In fact it looks like there's no case normalization at all. Should there be?:

amack@tee ~/projects/semanticvectors-svn+batch$  java -Xmx512M
pitt.search.semanticvectors.CompareTerms -luceneindexpath
/lt/work/amack/working/tee/gpdc-tests/lucene-indexes/iliad_all/index 
-queryvectorfile
/lt/work/amack/working/tee/gpdc-tests/lucene-indexes/iliad_all/termvectors.bin 
"RED
green" "blue"
Opening query vector store from file:
/lt/work/amack/working/tee/gpdc-tests/lucene-indexes/iliad_all/termvectors.bin
Didn't find vector for 'RED'
No vector for RED
Outputting similarity of "RED green" with "blue" ...
0.14490755

Original comment by admac...@gmail.com on 10 Jul 2009 at 11:48

GoogleCodeExporter commented 9 years ago
Strange ... there's a "matchcase" flag that you're supposed to switch on, e.g., 
I get
results for 
java pitt.search.semanticvectors.Search peter
java pitt.search.semanticvectors.Search PeTeR
java pitt.search.semanticvectors.Search --matchcase peter 
but no results for 
java pitt.search.semanticvectors.Search --matchcase PeTeR
Is that flag working for you?

Original comment by dwidd...@gmail.com on 11 Jul 2009 at 12:36

GoogleCodeExporter commented 9 years ago
Ah, OK. I was only looking at CompareTerms(Batch). That works fine for me with 
the
Search class, but the same flag does nothing for CompareTerms. Presumably it 
would
make more sense for CompareTerms to have similar case normalisation behaviour?

Original comment by admac...@gmail.com on 13 Jul 2009 at 2:38