Closed asfimport closed 18 years ago
Erik Hatcher (@erikhatcher) (migrated from JIRA)
thanks for the test! the test fails for me though. i have committed the test file and updated analyzer to the sandbox though. i look forward to a patch that fixes the test case :)
Jean-François Halleux (migrated from JIRA)
Looks like special French characters got transformed to something weird when you copied the source to your local CVS. For me at least, they appear well in Bugzilla. They are in the 0-255 range. The previous version of FrenchAnalyzer in CVS had them right.
JF
Erik Hatcher (@erikhatcher) (migrated from JIRA)
Could you please attach a patch file (cvs diff -u) or the entire file - as an attachment - so nothing can get lost in copy/paste?
Jean-François Halleux (migrated from JIRA)
Created an attachment (id=10072) This attachement contains a patch to your latest commits. Here the test case runs fine.
Erik Hatcher (@erikhatcher) (migrated from JIRA)
Sorry for my incompetence, but I cannot get the patch files to apply appropriately:
patch -p0 < patch.txt
(Stripping trailing CRs from patch.)
patching file java/org/apache/lucene/analysis/fr/FrenchAnalyzer.java
Hunk #1
FAILED at 1.
1 out of 1 hunk FAILED – saving rejects to file java/org/apache/lucene/analysis/fr/
FrenchAnalyzer.java.rej
(Stripping trailing CRs from patch.)
Could you please attach the full files and I will simply replace my local copies and commit them?
Thanks!
Jean-François Halleux (migrated from JIRA)
Created an attachment (id=10073) the French Analyzer file
Jean-François Halleux (migrated from JIRA)
Created an attachment (id=10074) The test case
Erik Hatcher (@erikhatcher) (migrated from JIRA)
Test still failing for me after applying your latest patch. The differences seem pretty dramatic - be sure to use CVS HEAD. I've committed what you sent, but I'm getitng this failure:
test: [junit] Testsuite: org.apache.lucene.analysis.fr.TestFrenchAnalyzer [junit] Tests run: 1, Failures: 1, Errors: 0, Time elapsed: 0.487 sec
[junit] Testcase: testAnalyzer(org.apache.lucene.analysis.fr.TestFrenchAnalyzer): FAILED
[junit] expected:<...?...> but was:<...?...>
[junit] junit.framework.ComparisonFailure: expected:<...?...> but was:<...?...>
[junit] at
org.apache.lucene.analysis.fr.TestFrenchAnalyzer.assertAnalyzesTo(TestFrenchAnalyzer.java:84) [junit] at org.apache.lucene.analysis.fr.TestFrenchAnalyzer.testAnalyzer(TestFrenchAnalyzer.java:141) [junit] at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) [junit] at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) [junit] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
Jean-François Halleux (migrated from JIRA)
Strange...
Just did a full checkout of lucene and sandbox, run the test and it worked properly. Could there be a problem with the locale? Anybody can try this?
Jeff
Erik Hatcher (@erikhatcher) (migrated from JIRA)
well, if it works for you, i'll close this issue. i'm far from being I18N savvy, so it is likely a locale issue on my end.... although surely the test case can be made to pass for me somehow?
Hello,
following is a test case for the French Analyzer to help it get out of the sandbox :) Looks OK, only has some strange behavior with the minus sign. I included a slight modification of the Analyzer to better handle null parameters just in case of.
—
package org.apache.lucene.analysis.fr;
/* ====================================================================
import java.io.Reader; import java.io.StringReader;
import junit.framework.TestCase;
import org.apache.lucene.analysis.Analyzer; import org.apache.lucene.analysis.Token; import org.apache.lucene.analysis.TokenStream;
/**
@author
Jean-François Halleux@version
$version$ */public class TestFrenchAnalyzer extends TestCase {
(input));
("dummy")); } catch (IllegalArgumentException iae) { iaeFlag = true; } assertEquals(iaeFlag, true);
sign // is often used for composing words assertAnalyzesTo( fa, "Jean-François", new String[] { "jean", "françois" });
Java++", new String[] { "c3po", "aujourd'hui", "oeuf", "ïâöûà ä", "anticonstitutionnel", "jav" });
1945", "1940", "1945", "i" });
}
—
package org.apache.lucene.analysis.fr;
import org.apache.lucene.analysis.Analyzer; import org.apache.lucene.analysis.LowerCaseFilter; import org.apache.lucene.analysis.StopFilter; import org.apache.lucene.analysis.TokenStream; import org.apache.lucene.analysis.standard.StandardFilter; import org.apache.lucene.analysis.standard.StandardTokenizer; import java.io.File; import java.io.Reader; import java.util.Hashtable; import org.apache.lucene.analysis.de.WordlistLoader;
/**
@author
Patrick Talbot (based on Gerhard Schwarz work for German)@version
$Id: FrenchAnalyzer.java,v 1.1 2004/01/20 10:07:01 ehatcher Exp $ */ public final class FrenchAnalyzer extends Analyzer {/**
Extended list of typical french stopwords. */ private String[] FRENCH_STOP_WORDS = {
"a", "afin", "ai", "ainsi", "après", "attendu", "au", "aujourd", "auquel ", "aussi",
"autre", "autres", "aux", "auxquelles", "auxquels", "avait", "avant", "a vec", "avoir",
"c", "car", "ce", "ceci", "cela", "celle", "celles", "celui", "cependant ", "certain",
"certaine", "certaines", "certains", "ces", "cet", "cette", "ceux", "che z", "ci",
"combien", "comme", "comment", "concernant", "contre", "d", "dans", "de" , "debout",
"dedans", "dehors", "delà ", "depuis", "derrière", "des", "désormais", "d esquelles",
"desquels", "dessous", "dessus", "devant", "devers", "devra", "divers", "diverse",
"diverses", "doit", "donc", "dont", "du", "duquel", "durant", "dès", "el le", "elles",
"en", "entre", "environ", "est", "et", "etc", "etre", "eu", "eux", "exce pté", "hormis",
"hors", "hélas", "hui", "il", "ils", "j", "je", "jusqu", "jusque", "l", "la", "laquelle",
"le", "lequel", "les", "lesquelles", "lesquels", "leur", "leurs", "lorsq ue", "lui", "là ",
"ma", "mais", "malgré", "me", "merci", "mes", "mien", "mienne", "miennes ", "miens", "moi",
"moins", "mon", "moyennant", "même", "mêmes", "n", "ne", "ni", "non", "n os", "notre",
"nous", "néanmoins", "nôtre", "nôtres", "on", "ont", "ou", "outre", "où" , "par", "parmi",
"partant", "pas", "passé", "pendant", "plein", "plus", "plusieurs", "pou r", "pourquoi",
"proche", "près", "puisque", "qu", "quand", "que", "quel", "quelle", "qu elles", "quels",
"qui", "quoi", "quoique", "revoici", "revoilà ", "s", "sa", "sans", "sauf ", "se", "selon",
"seront", "ses", "si", "sien", "sienne", "siennes", "siens", "sinon", "s oi", "soit",
"son", "sont", "sous", "suivant", "sur", "ta", "te", "tes", "tien", "tie nne", "tiennes",
"tiens", "toi", "ton", "tous", "tout", "toute", "toutes", "tu", "un", "u ne", "va", "vers",
"voici", "voilà ", "vos", "votre", "vous", "vu", "vôtre", "vôtres", "y", "à ", "ça", "ès", "été", "être", "ô" };
/**
Contains words that should be indexed but not stemmed. */ private Hashtable excltable = new Hashtable();
/**
Builds an analyzer. */ public FrenchAnalyzer() { stoptable = StopFilter.makeStopTable( FRENCH_STOP_WORDS ); }
/**
Builds an analyzer with the given stop words. */ public FrenchAnalyzer( String[] stopwords ) { stoptable = StopFilter.makeStopTable( stopwords ); }
/**
Builds an analyzer with the given stop words. */ public FrenchAnalyzer( Hashtable stopwords ) { stoptable = stopwords; }
/**
Builds an analyzer with the given stop words. */ public FrenchAnalyzer( File stopwords ) { stoptable = WordlistLoader.getWordtable( stopwords ); }
/**
Builds an exclusionlist from the words contained in the given file. */ public void setStemExclusionTable( File exclusionlist ) { excltable = WordlistLoader.getWordtable( exclusionlist ); }
/**
@return
A TokenStream build from a StandardTokenizer filtered withStandardFilter, StopFilter, FrenchStemFilter and LowerCaseFilter */ public final TokenStream tokenStream( String fieldName, Reader reader ) {
("fieldName must not be null"); if (reader==null) throw new IllegalArgumentException("reader must not be null");
} }
Migrated from LUCENE-172 by Jean-François Halleux, resolved May 27 2006 Environment:
Attachments: ASF.LICENSE.NOT.GRANTED--FrenchAnalyzer.java, ASF.LICENSE.NOT.GRANTED--patch2.txt, ASF.LICENSE.NOT.GRANTED--TestFrenchAnalyzer.java