Closed GoogleCodeExporter closed 8 years ago
It's a good idea. I'll implement the StopMatch command in next snapshot (within
few
days).
Thanks for suggestion.
Original comment by dp.max...@gmail.com
on 9 Nov 2009 at 11:19
Sorry, it took more time than expected at first sight.
Try fresh snapshot
http://dataparksearch.googlecode.com/files/dpsearch-4.53-13122009.tar.bz2
You can use Match: command in a stopwordfile to specify regular expression for
stopwords. NB: they are very primitive regex, but you can use any charset
supported
by DataparkSearch to specify them.
E.g. for your case the command is:
Match: regex ^\$##
Original comment by dp.max...@gmail.com
on 13 Dec 2009 at 2:33
[deleted comment]
Hi Maxime,
Got the new version and we are trying it out. You mentioned the regex
expressions
were very primitive so we don't know if it will support what we are trying to
do. For
example, can we use "Match: regex ^[^a-z0-9A-Z]+$" to have a stopword be any
word that
contains a character other than a letter or a number. If not, can we use the
NoMatch
keyword to accomplish the same with the expression being "NoMatch: regex
^[a-z0-9A-Z]"? If
we can get this to work we think the dbase will shrink considerable, upwards of
50%.
Thanks!
Original comment by Imlbr...@gmail.com
on 5 Jan 2010 at 8:09
Unfortunately, intervals aren't supported in stopword regex, though you can use
nomatch option with it, so you commmands could be:
Match: nomatch regex
^[0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ]
Match: nomatch regex
[0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ]$
which eliminate all "words" that doesn't start or end with a digit or a letter.
Original comment by dp.max...@gmail.com
on 5 Jan 2010 at 9:34
Original issue reported on code.google.com by
Imlbr...@gmail.com
on 6 Nov 2009 at 9:35