dhfbk / tint

The Italian NLP Tool
http://tint.fbk.eu
GNU General Public License v3.0
70 stars 9 forks source link

Configuration for NER and POS #18

Closed loretoparisi closed 3 years ago

loretoparisi commented 7 years ago

First thank you for your great work on the italian language for CoreNLP. I'm trying the NER and POS tagger. My simplest configuration for CoreNLP is the following:

{ 
  'annotators': 'tokenize,ssplit,pos,lemma,ner',
  'ner.model': '/ner-ita-nogpe-noiob_gaz_wikipedia_sloppy.ser.gz',
  'pos.model': '/italian-fast.tagger',
  'depparse.model': '/parser-model-1.txt.gz',
  'customAnnotatorClass.ita_toksent': 
  'eu.fbk.dh.tint.tokenizer.annotators.ItalianTokenizerAnnotator',
  'customAnnotatorClass.ita_toksent': 'eu.fbk.dh.tint.tokenizer.annotators.ItalianTokenizerAnnotator',
  'customAnnotatorClass.ita_lemma': 'eu.fbk.dh.tint.digimorph.annotator.DigiLemmaAnnotator',
'customAnnotatorClass.ita_morpho': 'eu.fbk.dh.tint.digimorph.annotator.DigiMorphAnnotator',
  'ssplit.newlineIsSentenceBreak' : 'always',
  'ner.useSUTime': 0,
}

Entities like DATE, LOC, PER are being recognized. Part of Speech tags as well. I have seen that there are other annotators like Geoloc, HeidelTime, customized Lemma, etc.

For the given configuration, this is my pipeline output:

13:01:21.587 [main] INFO  e.s.nlp.pipeline.StanfordCoreNLP - Registering annotator ita_toksent with class eu.fbk.dh.tint.tokenizer.annotators.ItalianTokenizerAnnotator
13:01:21.590 [main] INFO  e.s.nlp.pipeline.StanfordCoreNLP - Registering annotator ita_morpho with class eu.fbk.dh.tint.digimorph.annotator.DigiMorphAnnotator
13:01:21.590 [main] INFO  e.s.nlp.pipeline.StanfordCoreNLP - Registering annotator ita_lemma with class eu.fbk.dh.tint.digimorph.annotator.DigiLemmaAnnotator
13:01:21.591 [main] INFO  e.s.nlp.pipeline.StanfordCoreNLP - Adding annotator tokenize
13:01:21.605 [main] INFO  e.s.nlp.pipeline.StanfordCoreNLP - Adding annotator ssplit
13:01:21.610 [main] INFO  e.s.nlp.pipeline.StanfordCoreNLP - Adding annotator pos
13:01:21.634 [main] INFO  e.s.nlp.tagger.maxent.MaxentTagger - warning: no language set, no open-class tags specified, and no closed-class tags specified; assuming ALL tags are open class tags
13:01:21.975 [main] INFO  e.s.nlp.tagger.maxent.MaxentTagger - Loading POS tagger from /root/italian-fast.tagger ... done [0.3 sec].
13:01:21.976 [main] INFO  e.s.nlp.pipeline.StanfordCoreNLP - Adding annotator lemma
13:01:21.976 [main] INFO  e.s.nlp.pipeline.StanfordCoreNLP - Adding annotator ner
13:01:25.247 [main] INFO  e.s.n.ie.AbstractSequenceClassifier - Loading classifier from /root/ner-ita-nogpe-noiob_gaz_wikipedia_sloppy.ser.gz ... done [3.2 sec].

so the custom annotators like ita_lemma and ita_toksent are registered, but I'm not sure that are actually loaded, instead of default ones.

Thank you.

ziorufus commented 7 years ago

In this configuration, you are using the English annotators. The correct list of annotators is:

annotators=ita_toksent, pos, ita_morpho, ita_lemma, ner

It is weird that you get the DATEs, as they are not included in the Italian model (unless you are using an English text and the English model). If you want to use HeidelTime, you should add:

customAnnotatorClass.timex=eu.fbk.dh.tint.heideltime.annotator.HeidelTimeAnnotator
annotators=..., timex
timex.treeTaggerHome=path/to/tagger-scripts
timex.considerDate=true
timex.considerDuration=true
timex.considerSet=true
timex.considerTime=true
timex.typeSystemHome=desc/type/HeidelTime_TypeSystem.xml
timex.typeSystemHome_DKPro=desc/type/DKPro_TypeSystem.xml
timex.uimaVarDate=Date
timex.uimaVarDuration=Duration
timex.uimaVarLanguage=Language
timex.uimaVarSet=Set
timex.uimaVarTime=Time
timex.uimaVarTypeToProcess=Type
timex.uimaVarTemponym = Temponym
timex.considerTemponym = false
timex.chineseTokenizerPath=

where path/to/tagger-scripts is the path where you installed TreeTagger. You must leave the last one, timex.chineseTokenizerPath, even if it's blank, otherwise HeidelTime crashes.

To run the geocoder, you should have a local installation of Nominatim, or you can use the public one. The configuration is:

customAnnotatorClass.geoloc=eu.fbk.dh.tint.geoloc.annotator.GeolocAnnotator
annotators=..., geoloc
geoloc.geocoder_url=/path/to/nominatim

where /path/to/nominatim is the URL of Nominatim. By default, the GeolocAnnotator uses the Nominatim public one, that is slow and limited. If you use a local version, you can add a geoloc.use_local_geocoder boolean setting to skip the timeout. You can also set a geoloc.timeout option (in milliseconds), that works only when geoloc.use_local_geocoder is enabled (otherwise it is 1 second).

If you launch Tint using the included runner, all the customAnnotatorClasses are already set up correctly.

loretoparisi commented 7 years ago

@ziorufus Thank you. Is the TreeTagger necessary for the italian tagger? I'm using CoreNLP with default models and by language models like fr,zh,de,es,ar (custom models in the jar files for each language then).

If I use as annotators "ita_toksent,ita_lemma,ita_morpho,ssplit,pos,ner" the JVM complains it's missing the class PropertiesUtils:

15:02:51.158 [main] INFO  e.s.nlp.pipeline.StanfordCoreNLP - Registering annotator ita_toksent with class eu.fbk.dh.tint.tokenizer.annotators.ItalianTokenizerAnnotator
15:02:51.161 [main] INFO  e.s.nlp.pipeline.StanfordCoreNLP - Registering annotator timex with class eu.fbk.dh.tint.heideltime.annotator.HeidelTimeAnnotator
15:02:51.161 [main] INFO  e.s.nlp.pipeline.StanfordCoreNLP - Registering annotator ita_morpho with class eu.fbk.dh.tint.digimorph.annotator.DigiMorphAnnotator
15:02:51.161 [main] INFO  e.s.nlp.pipeline.StanfordCoreNLP - Registering annotator ita_lemma with class eu.fbk.dh.tint.digimorph.annotator.DigiLemmaAnnotator
15:02:51.162 [main] INFO  e.s.nlp.pipeline.StanfordCoreNLP - Adding annotator ita_toksent
{ Error: Error creating class
edu.stanford.nlp.util.MetaClass$ClassCreationException: MetaClass couldn't create public eu.fbk.dh.tint.tokenizer.annotators.ItalianTokenizerAnnotator(java.lang.String,java.util.Properties) with args [ita_toksent, {tokenize.language=de, ssplit.newlineIsSentenceBreak=always, lang=it, annotators=ita_toksent,ita_lemma,ita_morpho,ssplit,pos,ner, depparse.model=/root/parser-model-1.txt.gz, customAnnotatorClass.ita_toksent=eu.fbk.dh.tint.tokenizer.annotators.ItalianTokenizerAnnotator, customAnnotatorClass.timex=eu.fbk.dh.tint.heideltime.annotator.HeidelTimeAnnotator, pos.model=/root/italian-fast.tagger, parse.model=edu/stanford/nlp/models/srparser/germanSR.ser.gz, ner.useSUTime=0, customAnnotatorClass.ita_morpho=eu.fbk.dh.tint.digimorph.annotator.DigiMorphAnnotator, customAnnotatorClass.ita_lemma=eu.fbk.dh.tint.digimorph.annotator.DigiLemmaAnnotator, DATAROOT=/Users/loretoparisi/Dropbox (musixmatch)/Development/data/data/stanford, ner.model=/root/ner-ita-nogpe-noiob_gaz_wikipedia_sloppy.ser.gz}]
    at edu.stanford.nlp.util.MetaClass$ClassFactory.createInstance(MetaClass.java:237)
    at edu.stanford.nlp.util.MetaClass.createInstance(MetaClass.java:382)
    at edu.stanford.nlp.pipeline.AnnotatorImplementations.custom(AnnotatorImplementations.java:143)
    at edu.stanford.nlp.pipeline.StanfordCoreNLP.lambda$registerCustomAnnotators$66(StanfordCoreNLP.java:556)
    at edu.stanford.nlp.util.Lazy$3.compute(Lazy.java:118)
    at edu.stanford.nlp.util.Lazy.get(Lazy.java:31)
    at edu.stanford.nlp.pipeline.AnnotatorPool.get(AnnotatorPool.java:146)
    at edu.stanford.nlp.pipeline.StanfordCoreNLP.construct(StanfordCoreNLP.java:447)
    at edu.stanford.nlp.pipeline.StanfordCoreNLP.<init>(StanfordCoreNLP.java:150)
    at edu.stanford.nlp.pipeline.StanfordCoreNLP.<init>(StanfordCoreNLP.java:146)
    at edu.stanford.nlp.pipeline.StanfordCoreNLP.<init>(StanfordCoreNLP.java:133)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
    at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
Caused by: java.lang.reflect.InvocationTargetException
    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
    at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
    at edu.stanford.nlp.util.MetaClass$ClassFactory.createInstance(MetaClass.java:233)
    ... 14 more
Caused by: java.lang.NoClassDefFoundError: eu/fbk/utils/core/PropertiesUtils
    at eu.fbk.dh.tint.tokenizer.annotators.ItalianTokenizerAnnotator.<init>(ItalianTokenizerAnnotator.java:29)
    ... 19 more
Caused by: java.lang.ClassNotFoundException: eu.fbk.utils.core.PropertiesUtils
    at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
    ... 20 more

thank you

ziorufus commented 7 years ago

TreeTagger is only needed for the HeidelTime annotator. Tint uses the CoreNLP original tagger. Regarding your error, are you using Maven to include the tint-tokenizer?

loretoparisi commented 7 years ago

@ziorufus Ok got it! ...Nope, I'm including the jar manually in my project. Is that a sub-project? Thank you.

ziorufus commented 7 years ago

Which JAR are you including? You need to include the jar-with-dependencies.

loretoparisi commented 7 years ago

So far I have included only:

-rw-r--r--  1 loretoparisi  staff     3534201 17 Lug 12:54 /root/tint-digimorph-0.1.jar
-rw-r--r--  1 loretoparisi  staff       10333 17 Lug 12:47 /root/tint-digimorph-annotator-0.1.jar
-rw-r--r--  1 loretoparisi  staff        8071 17 Lug 15:02 /root/tint-heideltime-annotator-0.1.jar
-rw-r--r--  1 loretoparisi  staff       24003 17 Lug 11:39 /root/tint-tokenizer-0.1.jar

Ah ok I see that it is part of the DKM package. Funny thing I do not find the eu.fbk.utils.core in the eu.fbk.utils I mean this one https://mvnrepository.com/artifact/eu.fbk.dkm.utils/utils/1.2

ziorufus commented 7 years ago

You need to include all the dependencies (recursively). You can find the dependencies in the pom.xml file, but I suggest you to use the Maven paradigm, otherwise you need to add tens of dependency by hand.

loretoparisi commented 7 years ago

Thank you. I'm actually using Maven to build the project:

[INFO] Reactor Summary:
[INFO] 
[INFO] tint ............................................... SUCCESS [  1.581 s]
[INFO] tint-textpro ....................................... SUCCESS [  0.477 s]
[INFO] tint-eval .......................................... SUCCESS [  0.040 s]
[INFO] tint-resources ..................................... SUCCESS [  0.102 s]
[INFO] tint-digimorph ..................................... SUCCESS [  0.123 s]
[INFO] tint-digimorph-annotator ........................... SUCCESS [  0.028 s]
[INFO] tint-tokenizer ..................................... SUCCESS [  0.031 s]
[INFO] tint-tense ......................................... SUCCESS [  0.021 s]
[INFO] tint-readability ................................... SUCCESS [  0.041 s]
[INFO] tint-geoloc-annotator .............................. SUCCESS [  0.018 s]
[INFO] tint-heideltime-annotator .......................... SUCCESS [  0.399 s]
[INFO] tint-models ........................................ SUCCESS [  0.012 s]
[INFO] tint-runner ........................................ SUCCESS [  0.925 s]
[INFO] tint-kd-annotator .................................. SUCCESS [  0.016 s]
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS

so I get these target jars

[loretoparisi@:mbploreto tint]$ find ./ -name \*.jar
.//target/tint-0.1-tests.jar
.//tint-digimorph/target/tint-digimorph-0.1.jar
.//tint-digimorph-annotator/target/tint-digimorph-annotator-0.1.jar
.//tint-eval/target/tint-eval-0.1.jar
.//tint-geoloc-annotator/target/tint-geoloc-annotator-0.1.jar
.//tint-heideltime-annotator/target/tint-heideltime-annotator-0.1-tests.jar
.//tint-heideltime-annotator/target/tint-heideltime-annotator-0.1.jar
.//tint-kd-annotator/target/tint-kd-annotator-0.1.jar
.//tint-models/target/tint-models-0.1.jar
.//tint-readability/target/tint-readability-0.1.jar
.//tint-resources/target/tint-resources-0.1.jar
.//tint-runner/target/tint-runner-0.1-tests.jar
.//tint-runner/target/tint-runner-0.1.jar
.//tint-tense/target/tint-tense-0.1.jar
.//tint-textpro/target/tint-textpro-0.1.jar
.//tint-tokenizer/target/tint-tokenizer-0.1-tests.jar
.//tint-tokenizer/target/tint-tokenizer-0.1.jar

I prefer to take the generated jars one by one and put in my classpath. The issue here is that I do not find that util in the maven generated depencies (mvn package / install).

ziorufus commented 7 years ago

Run mvn dependency:tree to print the list of dependencies recursively. As a suggestion, use the corenlp370 branch of Tint, so that you have the last version. In this case, you'll have to fix some dependencies, therefore you should mvn install utils and fcw before compiling Tint.

Anyway, if you include Tint in an existing Java project I suggest you to use Maven for both and include it into the pom.xml file. If you need to run Tint from the shell, just run mvn package -Prelease and uncompress the ready-to-use tar.gz archive you can find in the tint-runner/target folder.

loretoparisi commented 7 years ago

@ziorufus Yes that is the best solution, I now realize that there are too much dependencies in the ~/.m2/repository/ folder to copy... Grazie!

loretoparisi commented 7 years ago

@ziorufus So I did a check of corenlp370 and then I did mvn package -Prelease, but I get an error:

[ERROR] Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:3.3:compile (default-compile) on project tint-runner: Compilation failure: Compilation failure: 
[ERROR] /tint/tint-runner/src/main/java/eu/fbk/dh/tint/runner/TintPipeline.java:[6,39] package eu.fbk.utils.corenlp.outputters does not exist
[ERROR] /tint/tint-runner/src/main/java/eu/fbk/dh/tint/runner/TintPipeline.java:[147,44] package eu.fbk.utils.corenlp.outputters does not exist
[ERROR] /tint/tint-runner/src/main/java/eu/fbk/dh/tint/runner/TintPipeline.java:[150,13] cannot find symbol
[ERROR]   symbol:   variable TextProOutputter
[ERROR]   location: class eu.fbk.dh.tint.runner.TintPipeline
[ERROR] -> [Help 1]
[ERROR] 
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR] 
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException
[ERROR] 
[ERROR] After correcting the problems, you can resume the build with the command
[ERROR]   mvn <goals> -rf :tint-runner

The utils package compiles and build, while on the dependency fcw I get

[ERROR] Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:3.3:compile (default-compile) on project fcw-wikipedia: Compilation failure
[ERROR] /tint/fcw/fcw-wikipedia/src/main/java/eu/fbk/fcw/wikipedia/WikipediaCorefAnnotator.java:[143,41] cannot find symbol
[ERROR]   symbol:   class SimpleCorefAnnotation
[ERROR]   location: class eu.fbk.utils.corenlp.CustomAnnotations
[ERROR] 

NOTE. I can compile and build the version on the master branch with any issues.

ziorufus commented 7 years ago

You are right: before installing utils you need to switch to the develop branch.

loretoparisi commented 7 years ago

@ziorufus Ciao! I'm not sure about the steps to build tint, utils and fcw, with the latest support for CoreNLP 3.7.0. Could you please guide me through?

Thank you.

loretoparisi commented 7 years ago

@ziorufus maybe a simpler solution could be to provide the releases builds (i.e. packaged with all the dependencies) directly. Thank you.

loretoparisi commented 6 years ago

@ziorufus Ciao, a question about the above configuration for ner only. Assumed that the configuration is like annotators=ita_toksent, pos, ita_morpho, ita_lemma, ner, and I'm not going to use HeidelTime, do I need this jar only as dependency?

-rw-r--r--  1 loretoparisi  staff       24003 17 Lug 11:39 /root/tint-tokenizer-0.1.jar

If not, could you please point me to the related maven dependencies? At this time I have in my jar files

├── ahocorasick-0.3.0.jar
├── tint-digimorph-0.1.jar
├── tint-digimorph-annotator-0.1.jar
├── tint-heideltime-annotator-0.1.jar
├── tint-tokenizer-0.1.jar
└── utils-core-3.0.jar

and I would like to keep only the ones needed.

Thank you for your help!!!

ziorufus commented 6 years ago

@loretoparisi The problem is that each dependency as its own dependencies. If you use Maven, the dependency tree is built and managed automatically; if you want to include the jars, you need to resolve the tree and add everything.

borice commented 6 years ago

Hello @ziorufus

Sorry to write in this thread, but I have a related problem getting Stanford CoreNLP 3.9.1 to work with the Italian models from Tint. I have the following properties configuration file:

annotators = ita_toksent, ner
tokenize.language = it
ssplit.newlineIsSentenceBreak = false
pos.model = models/italian-fast.tagger
depparse.model = models/parser-model-1.txt.gz
ner.model = models/ner-ita-nogpe-noiob_gaz_wikipedia_sloppy.ser.gz
ner.applyNumericClassifiers = false
ner.useSUTime = false
ner.applyFineGrained = false

customAnnotatorClass.ita_toksent = eu.fbk.dh.tint.tokenizer.annotators.ItalianTokenizerAnnotator
customAnnotatorClass.ita_lemma = eu.fbk.dh.tint.digimorph.annotator.DigiLemmaAnnotator
customAnnotatorClass.ita_morpho = eu.fbk.dh.tint.digimorph.annotator.DigiMorphAnnotator

And the error I get is:

19:47:31.113  INFO Registering annotator ita_toksent with class eu.fbk.dh.tint.tokenizer.annotators.ItalianTokenizerAnnotator
19:47:31.114  INFO Registering annotator ita_morpho with class eu.fbk.dh.tint.digimorph.annotator.DigiMorphAnnotator
19:47:31.114  INFO Registering annotator ita_lemma with class eu.fbk.dh.tint.digimorph.annotator.DigiLemmaAnnotator
19:47:31.118  INFO Adding annotator ita_toksent
19:47:31.221  INFO Loaded 37 normalization rules
19:47:31.224  INFO Loaded 7 sentence splitting rules
19:47:31.225  INFO Loaded 6 token splitting rules
19:47:31.226  INFO Loaded 9 regular expressions
19:47:31.240  INFO Loaded 288 abbreviations
19:47:31.253  INFO Adding annotator ner
19:47:34.229  INFO Loading classifier from models/ner-ita-nogpe-noiob_gaz_wikipedia_sloppy.ser.gz ... done [2.9 sec].
Exception in thread "main" java.lang.IllegalArgumentException: annotator "ner" requires annotation "IsNewlineAnnotation". The usual requirements for this annotator are: tokenize,ssplit,pos,lemma
    at edu.stanford.nlp.pipeline.StanfordCoreNLP.construct(StanfordCoreNLP.java:504)

Any ideas?

If I add the pos, ita_morpho and ita_lemma annotators, I get a different error: Caused by: java.lang.ClassNotFoundException: kotlin.TypeCastException

I've added the Tint Maven dependency as instructed in the documentation:

<dependency>
    <groupId>eu.fbk.dh</groupId>
    <artifactId>tint-runner</artifactId>
    <version>0.2</version>
</dependency>

Thank you!

ziorufus commented 6 years ago

It seems that CoreNLP 3.9.1 added a new mandatory annotation. It is not documented on Stanford NLP website, therefore I needed to write to the group. For now, I patched it, hoping that it is enough. Just pull the develop branch, recompile using mvn clean install, edit the version in your POM from 0.2 to 1.0-SNAPSHOT and try again.

loretoparisi commented 6 years ago

@ziorufus yes there is a important migration to do for the annotators and sub-annotators: https://github.com/stanfordnlp/CoreNLP/issues/633#issuecomment-370109959

ziorufus commented 6 years ago

I know that, but the problem is not on sub annotators (that Tint is not using), but on a new annotation called IsNewlineAnnotation, that is required by the NER and it is not documented anywhere.

borice commented 6 years ago

Thank you @ziorufus Using the change you've done in the develop branch worked!

algoscale1 commented 6 years ago

Hello @ziorufus , I am finding an error while using tint with corenlp 3.9.1 Exception in thread "main" edu.stanford.nlp.util.MetaClass$ClassCreationException: java.lang.ClassNotFoundException: eu.fbk.dh.tint.tokenizer.annotators.ItalianTokenizerAnnotator at edu.stanford.nlp.util.MetaClass.createFactory(MetaClass.java:364) at edu.stanford.nlp.util.MetaClass.createInstance(MetaClass.java:381) at edu.stanford.nlp.pipeline.AnnotatorImplementations.custom(AnnotatorImplementations.java:141) at edu.stanford.nlp.pipeline.StanfordCoreNLP.lambda$null$67(StanfordCoreNLP.java:606) at edu.stanford.nlp.util.Lazy$3.compute(Lazy.java:126) at edu.stanford.nlp.util.Lazy.get(Lazy.java:31) at edu.stanford.nlp.pipeline.AnnotatorPool.get(AnnotatorPool.java:149) at edu.stanford.nlp.pipeline.StanfordCoreNLP.construct(StanfordCoreNLP.java:495) at edu.stanford.nlp.pipeline.StanfordCoreNLP.<init>(StanfordCoreNLP.java:201) at edu.stanford.nlp.pipeline.StanfordCoreNLP.<init>(StanfordCoreNLP.java:194) at edu.stanford.nlp.pipeline.StanfordCoreNLP.<init>(StanfordCoreNLP.java:181) at Caused by: java.lang.ClassNotFoundException: eu.fbk.dh.tint.tokenizer.annotators.ItalianTokenizerAnnotator at java.net.URLClassLoader.findClass(URLClassLoader.java:381) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) 19:09:48.221 [main] INFO e.s.nlp.pipeline.StanfordCoreNLP - Adding annotator ita_toksent at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:335) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:264) at edu.stanford.nlp.util.MetaClass$ClassFactory.construct(MetaClass.java:135) at edu.stanford.nlp.util.MetaClass$ClassFactory.<init>(MetaClass.java:202) at edu.stanford.nlp.util.MetaClass$ClassFactory.<init>(MetaClass.java:69) at edu.stanford.nlp.util.MetaClass.createFactory(MetaClass.java:360) ... 11 more

these are my configs properties.setProperty("annotators", "ita_toksent, pos, ita_morpho, ita_lemma, ner"); properties.setProperty("ner.model", "models/ner-ita-nogpe-noiob_gaz_wikipedia_sloppy.ser.gz"); properties.setProperty("pos.model","models/italian-fast.tagger"); properties.setProperty("depparse.model","models/parser-model-1.txt.gz"); properties.setProperty("customAnnotatorClass.ita_toksent", "eu.fbk.dh.tint.tokenizer.annotators.ItalianTokenizerAnnotator"); properties.setProperty("customAnnotatorClass.ita_lemma", "eu.fbk.dh.tint.digimorph.annotator.DigiLemmaAnnotator"); properties.setProperty("customAnnotatorClass.ita_morpho","eu.fbk.dh.tint.digimorph.annotator.DigiMorphAnnotator"); properties.setProperty("ssplit.newlineIsSentenceBreak","always"); properties.setProperty("ner.useSUTime","false");

I am using maven dependency of tint as mentioned in docs . Any help would be greatly appreciated..thanks!!

ziorufus commented 6 years ago

Try to add this dependency to the pom.xml file.

        <dependency>
            <groupId>eu.fbk.dh</groupId>
            <artifactId>tint-tokenizer</artifactId>
            <version>1.0-SNAPSHOT</version>
            <scope>runtime</scope>
        </dependency>

Use the develop branch.

dvakhil8 commented 6 years ago

Hello @ziorufus, I have cloned source code and used develop branch and compiled using mvn clean install . Also I changed my pom version from 0.2 to 1.0-SNAPSHOT but still facing issue Exception in thread "main" edu.stanford.nlp.util.MetaClass$ClassCreationException: java.lang.ClassNotFoundException: eu.fbk.dh.tint.tokenizer.annotators.ItalianTokenizerAnnotator

andreaferretti commented 5 years ago

Hi @ziorufus, I am trying to use TINT for italian NER as well, following the configuration you mention at the beginning of this thread. I have a couple of questions:

ziorufus commented 5 years ago

Tint uses HeidelTime standalone because it's hard to integrate it in a flow that does not use UIMA. Tree tagger is required because it uses the correct POS tags. We are working on a custom version of HeidelTime that can be easily integrated into the Tint POS tags, but it's not ready yet.

andreaferretti commented 5 years ago

@ziorufus thank you! Unfortunately, this makes tint a little hard to deploy, since it starts a lot of processes in the background for heideltime and brings a dependency on perl...

nadezdaalexandrovna commented 5 years ago

Hi @ziorufus, sorry for disturbing you, I am trying to integrate Tint into a Pepper module and am getting the error resource italian.db not found : Caused by: edu.stanford.nlp.util.MetaClass$ClassCreationException: MetaClass couldn't create public eu.fbk.dh.tint.digimorph.annotator.DigiMorphAnnotator(java.lang.String,java.util.Properties) with args [ita_morpho, {readability.glossario.use=no, timex.uimaVarLanguage=Language, timex.uimaVarDuration=Duration, timex.chineseTokenizerPath=, ner.useSUTime=0, timex.typeSystemHome_DKPro=desc/type/DKPro_TypeSystem.xml, dbps.first_confidence=0.5, timex.uimaVarTime=Time, timex.uimaVarTypeToProcess=Type, dbps.min_confidence=0.3, customAnnotatorClass.keyphrase=eu.fbk.dh.kd.annotator.DigiKdAnnotator, customAnnotatorClass.fake_dep=eu.fbk.dkm.pikes.depparseannotation.StanfordToConllDepsAnnotator, timex.uimaVarDate=Date, dbps.address=http://spotlight.sztaki.hu:2230/rest, customAnnotatorClass.timex=eu.fbk.dh.tint.heideltime.annotator.HeidelTimeAnnotator, customAnnotatorClass.geoloc=eu.fbk.dh.tint.geoloc.annotator.GeolocAnnotator, timex.uimaVarSet=Set, dbps.annotator=dbpedia-annotate, pos.model=models/italian-fast.tagger, dbps.extract_types=0, customAnnotatorClass.readability=eu.fbk.dh.tint.readability.ReadabilityAnnotator, timex.considerTemponym=false, annotators=ita_toksent, pos, ita_morpho, ita_lemma, ner, timex.considerTime=true, timex.treeTaggerHome=/home/nadiushka/treetagger/cmd, timex.considerDate=true, customAnnotatorClass.ita_tense=eu.fbk.dh.tint.tense.TenseAnnotator, readability.glossario.parse=yes, readability.language=it, depparse.model=models/parser-model-1.txt.gz, customAnnotatorClass.dbps=eu.fbk.dkm.pikes.twm.LinkingAnnotator, customAnnotatorClass.ita_morpho=eu.fbk.dh.tint.digimorph.annotator.DigiMorphAnnotator, timex.typeSystemHome=desc/type/HeidelTime_TypeSystem.xml, timex.uimaVarTemponym=Temponym, readability.glossario.stanford.annotators=ita_toksent, pos, ita_morpho, ita_lemma, customAnnotatorClass.ml=eu.fbk.dkm.pikes.twm.LinkingAnnotator, ner.model=models/ner-ita-nogpe-noiob_gaz_wikipedia_sloppy.ser.gz, customAnnotatorClass.ita_toksent=eu.fbk.dh.tint.tokenizer.annotators.ItalianTokenizerAnnotator, timex.considerDuration=true, timex.considerSet=true, customAnnotatorClass.ita_lemma=eu.fbk.dh.tint.digimorph.annotator.DigiLemmaAnnotator}] at edu.stanford.nlp.util.MetaClass$ClassFactory.createInstance(MetaClass.java:237) ... at eu.fbk.dh.tint.runner.TintPipeline.runRaw(TintPipeline.java:103) at CoreNLPPepper.CoreNLPPepper.CoreNLPManipulator$CoreNLPMapper.testStanfordItalian(CoreNLPManipulator.java:238) at CoreNLPPepper.CoreNLPPepper.CoreNLPManipulator$CoreNLPMapper.mapSDocument(CoreNLPManipulator.java:144) at org.corpus_tools.pepper.impl.PepperMapperControllerImpl.map(PepperMapperControllerImpl.java:251) at org.corpus_tools.pepper.impl.PepperMapperControllerImpl.run(PepperMapperControllerImpl.java:188) Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at edu.stanford.nlp.util.MetaClass$ClassFactory.createInstance(MetaClass.java:233) ... 16 more Caused by: java.lang.IllegalArgumentException: resource italian.db not found. at com.google.common.base.Preconditions.checkArgument(Preconditions.java:146) at com.google.common.io.Resources.getResource(Resources.java:197) at eu.fbk.dh.tint.digimorph.DigiMorph.(DigiMorph.java:58) at eu.fbk.dh.tint.digimorph.annotator.DigiMorphModel.getInstance(DigiMorphModel.java:14) at eu.fbk.dh.tint.digimorph.annotator.DigiMorphAnnotator.(DigiMorphAnnotator.java:23)

I included the tint-digimorph jar into my pom.xml and it gets copied to the snapshot and I also copy it to the dependency folder used by Pepper.

Would you have any idea? Thanks a lot!

ziorufus commented 5 years ago

Try to save the italian.db file somewhere on your computer, and specify it using the property ita_morpho.model You can find the file here: https://github.com/dhfbk/tint/tree/master/tint-digimorph/src/main/resources

I suggest you to use the develop branch.

Best, Alessio

Il giorno mar 19 feb 2019 alle ore 17:13 nadezdaalexandrovna < notifications@github.com> ha scritto:

Hi @ziorufus https://github.com/ziorufus, sorry for disturbing you, I am trying to integrate Tint into a Pepper module and am getting the error resource italian.db not found : Caused by: edu.stanford.nlp.util.MetaClass$ClassCreationException: MetaClass couldn't create public eu.fbk.dh.tint.digimorph.annotator.DigiMorphAnnotator(java.lang.String,java.util.Properties) with args [ita_morpho, {readability.glossario.use=no, timex.uimaVarLanguage=Language, timex.uimaVarDuration=Duration, timex.chineseTokenizerPath=, ner.useSUTime=0, timex.typeSystemHome_DKPro=desc/type/DKPro_TypeSystem.xml, dbps.first_confidence=0.5, timex.uimaVarTime=Time, timex.uimaVarTypeToProcess=Type, dbps.min_confidence=0.3, customAnnotatorClass.keyphrase=eu.fbk.dh.kd.annotator.DigiKdAnnotator, customAnnotatorClass.fake_dep=eu.fbk.dkm.pikes.depparseannotation.StanfordToConllDepsAnnotator, timex.uimaVarDate=Date, dbps.address=http://spotlight.sztaki.hu:2230/rest, customAnnotatorClass.timex=eu.fbk.dh.tint.heideltime.annotator.HeidelTimeAnnotator, customAnnotatorClass.geoloc=eu.fbk.dh.tint.geoloc.annotator.GeolocAnnotator, timex.uimaVarSet=Set, dbps.annotator=dbpedia-annotate, pos.model=models/italian-fast.tagger, dbps.extract_types=0, customAnnotatorClass.readability=eu.fbk.dh.tint.readability.ReadabilityAnnotator, timex.considerTemponym=false, annotators=ita_toksent, pos, ita_morpho, ita_lemma, ner, timex.considerTime=true, timex.treeTaggerHome=/home/nadiushka/treetagger/cmd, timex.considerDate=true, customAnnotatorClass.ita_tense=eu.fbk.dh.tint.tense.TenseAnnotator, readability.glossario.parse=yes, readability.language=it, depparse.model=models/parser-model-1.txt.gz, customAnnotatorClass.dbps=eu.fbk.dkm.pikes.twm.LinkingAnnotator, customAnnotatorClass.ita_morpho=eu.fbk.dh.tint.digimorph.annotator.DigiMorphAnnotator, timex.typeSystemHome=desc/type/HeidelTime_TypeSystem.xml, timex.uimaVarTemponym=Temponym, readability.glossario.stanford.annotators=ita_toksent, pos, ita_morpho, ita_lemma, customAnnotatorClass.ml=eu.fbk.dkm.pikes.twm.LinkingAnnotator, ner.model=models/ner-ita-nogpe-noiob_gaz_wikipedia_sloppy.ser.gz, customAnnotatorClass.ita_toksent=eu.fbk.dh.tint.tokenizer.annotators.ItalianTokenizerAnnotator, timex.considerDuration=true, timex.considerSet=true, customAnnotatorClass.ita_lemma=eu.fbk.dh.tint.digimorph.annotator.DigiLemmaAnnotator}] at edu.stanford.nlp.util.MetaClass$ClassFactory.createInstance(MetaClass.java:237) ... at eu.fbk.dh.tint.runner.TintPipeline.runRaw(TintPipeline.java:103) at CoreNLPPepper.CoreNLPPepper.CoreNLPManipulator$CoreNLPMapper.testStanfordItalian(CoreNLPManipulator.java:238) at CoreNLPPepper.CoreNLPPepper.CoreNLPManipulator$CoreNLPMapper.mapSDocument(CoreNLPManipulator.java:144) at org.corpus_tools.pepper.impl.PepperMapperControllerImpl.map(PepperMapperControllerImpl.java:251) at org.corpus_tools.pepper.impl.PepperMapperControllerImpl.run(PepperMapperControllerImpl.java:188) Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at edu.stanford.nlp.util.MetaClass$ClassFactory.createInstance(MetaClass.java:233) ... 16 more Caused by: java.lang.IllegalArgumentException: resource italian.db not found. at com.google.common.base.Preconditions.checkArgument(Preconditions.java:146) at com.google.common.io.Resources.getResource(Resources.java:197) at eu.fbk.dh.tint.digimorph.DigiMorph.(DigiMorph.java:58) at eu.fbk.dh.tint.digimorph.annotator.DigiMorphModel.getInstance(DigiMorphModel.java:14) at eu.fbk.dh.tint.digimorph.annotator.DigiMorphAnnotator.(DigiMorphAnnotator.java:23)

I included the tint-digimorph jar into my pom.xml and it gets copied to the snapshot and I also copy it to the dependency folder used by Pepper.

Would you have any idea? Thanks a lot!

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/dhfbk/tint/issues/18#issuecomment-465198168, or mute the thread https://github.com/notifications/unsubscribe-auth/ADWtiFpKuvU6JIe1BDTCG2rIPOeHUXAgks5vPCKjgaJpZM4OZyzr .

nadezdaalexandrovna commented 5 years ago

Thank you very much for your quick reply! It worked and now I have a new error: the beginning is the same, but the end is different: Caused by: java.lang.NoClassDefFoundError: org/mapdb/volume/MappedFileVol at eu.fbk.dh.tint.digimorph.DigiMorph.(DigiMorph.java:67) at eu.fbk.dh.tint.digimorph.annotator.DigiMorphModel.getInstance(DigiMorphModel.java:14) at eu.fbk.dh.tint.digimorph.annotator.DigiMorphAnnotator.(DigiMorphAnnotator.java:23)

Maybe you have an idea about this one, too? Thank you!

ziorufus commented 5 years ago

Try adding this dependency to your pom.xml file

    <dependency>
        <groupId>org.mapdb</groupId>
        <artifactId>mapdb</artifactId>
        <version>3.0.1</version>
    </dependency>

Best, Alessio

Il giorno mar 19 feb 2019 alle ore 17:31 nadezdaalexandrovna < notifications@github.com> ha scritto:

Thank you very much for your quick reply! It worked and now I have a new error: the beginning is the same, but the end is different: Caused by: java.lang.NoClassDefFoundError: org/mapdb/volume/MappedFileVol at eu.fbk.dh.tint.digimorph.DigiMorph.(DigiMorph.java:67) at eu.fbk.dh.tint.digimorph.annotator.DigiMorphModel.getInstance(DigiMorphModel.java:14) at eu.fbk.dh.tint.digimorph.annotator.DigiMorphAnnotator.(DigiMorphAnnotator.java:23)

Maybe you have an idea about this one, too? Thank you!

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/dhfbk/tint/issues/18#issuecomment-465206858, or mute the thread https://github.com/notifications/unsubscribe-auth/ADWtiHYBNW4_ls_BG0GLNpd3YBxLe9pcks5vPCbygaJpZM4OZyzr .

nadezdaalexandrovna commented 5 years ago

Thank you! I am still getting errors, but will continue trying on my own now. Thanks a lot for being so responsive!

nadezdaalexandrovna commented 5 years ago

Good morning Alessio, Sorry to disturb you, I am trying to use the development version. I have compiled the tint-runner-1.0-SNAPSHOT.jar and the tint-runner-1.0-SNAPSHOT-jar-with-dependencies.jar following the instructions on github. The result is successful and the reactor summary is the following: Reactor Summary: [INFO] [INFO] tint ............................................... SUCCESS [ 1.253 s] [INFO] tint-eval .......................................... SUCCESS [ 1.975 s] [INFO] tint-resources ..................................... SUCCESS [ 5.017 s] [INFO] tint-digimorph ..................................... SUCCESS [ 2.199 s] [INFO] tint-digimorph-annotator ........................... SUCCESS [ 0.380 s] [INFO] tint-tokenizer ..................................... SUCCESS [ 0.278 s] [INFO] tint-verb .......................................... SUCCESS [ 0.583 s] [INFO] tint-readability ................................... SUCCESS [ 1.047 s] [INFO] tint-derived ....................................... SUCCESS [ 0.153 s] [INFO] tint-heideltime-annotator .......................... SUCCESS [ 0.343 s] [INFO] tint-models ........................................ SUCCESS [ 6.418 s] [INFO] tint-runner ........................................ SUCCESS [ 45.478 s] [INFO] tint-inverse-digimorph ............................. SUCCESS [ 1.482 s] [INFO] tint-simplifier .................................... SUCCESS [ 20.939 s]

Now I need to make this jar accessible to my project, so I need to install it into my ./m2 folder. I tried to do it with the following command: mvn --also-make-dependents install:install-file -Dfile=tint-runner/target/tint-runner-1.0-SNAPSHOT.jar -DgroupId=eu.fbk.dh -DartifactId=tint-runner -Dversion=1.0-SNAPSHOT -Dpackaging=jar but the reactor summary was different: Reactor Summary: [INFO] [INFO] tint ............................................... SUCCESS [ 0.290 s] [INFO] tint-eval .......................................... SKIPPED [INFO] tint-resources ..................................... SKIPPED [INFO] tint-digimorph ..................................... SKIPPED [INFO] tint-digimorph-annotator ........................... SKIPPED [INFO] tint-tokenizer ..................................... SKIPPED [INFO] tint-verb .......................................... SKIPPED [INFO] tint-readability ................................... SKIPPED [INFO] tint-derived ....................................... SKIPPED [INFO] tint-heideltime-annotator .......................... SKIPPED [INFO] tint-models ........................................ SKIPPED [INFO] tint-runner ........................................ SKIPPED [INFO] tint-inverse-digimorph ............................. SKIPPED [INFO] tint-simplifier .................................... SKIPPED

How can I make all the modules get installed and not only the first one? Thanks a lot!

ziorufus commented 5 years ago

Did you try a simple "mvn install"? It should work... A.

Il giorno gio 21 feb 2019 alle ore 10:32 nadezdaalexandrovna < notifications@github.com> ha scritto:

Good morning Alessio, Sorry to disturb you, I am trying to use the development version. I have compiled the tint-runner-1.0-SNAPSHOT.jar and the tint-runner-1.0-SNAPSHOT-jar-with-dependencies.jar following the instructions on github. The result is successful and the reactor summary is the following: Reactor Summary: [INFO] [INFO] tint ............................................... SUCCESS [ 1.253 s] [INFO] tint-eval .......................................... SUCCESS [ 1.975 s] [INFO] tint-resources ..................................... SUCCESS [ 5.017 s] [INFO] tint-digimorph ..................................... SUCCESS [ 2.199 s] [INFO] tint-digimorph-annotator ........................... SUCCESS [ 0.380 s] [INFO] tint-tokenizer ..................................... SUCCESS [ 0.278 s] [INFO] tint-verb .......................................... SUCCESS [ 0.583 s] [INFO] tint-readability ................................... SUCCESS [ 1.047 s] [INFO] tint-derived ....................................... SUCCESS [ 0.153 s] [INFO] tint-heideltime-annotator .......................... SUCCESS [ 0.343 s] [INFO] tint-models ........................................ SUCCESS [ 6.418 s] [INFO] tint-runner ........................................ SUCCESS [ 45.478 s] [INFO] tint-inverse-digimorph ............................. SUCCESS [ 1.482 s] [INFO] tint-simplifier .................................... SUCCESS [ 20.939 s]

Now I need to make this jar accessible to my project, so I need to install it into my ./m2 folder. I tried to do it with the following command: mvn --also-make-dependents install:install-file -Dfile=tint-runner/target/tint-runner-1.0-SNAPSHOT.jar -DgroupId=eu.fbk.dh -DartifactId=tint-runner -Dversion=1.0-SNAPSHOT -Dpackaging=jar but the reactor summary was different: Reactor Summary: [INFO] [INFO] tint ............................................... SUCCESS [ 0.290 s] [INFO] tint-eval .......................................... SKIPPED [INFO] tint-resources ..................................... SKIPPED [INFO] tint-digimorph ..................................... SKIPPED [INFO] tint-digimorph-annotator ........................... SKIPPED [INFO] tint-tokenizer ..................................... SKIPPED [INFO] tint-verb .......................................... SKIPPED [INFO] tint-readability ................................... SKIPPED [INFO] tint-derived ....................................... SKIPPED [INFO] tint-heideltime-annotator .......................... SKIPPED [INFO] tint-models ........................................ SKIPPED [INFO] tint-runner ........................................ SKIPPED [INFO] tint-inverse-digimorph ............................. SKIPPED [INFO] tint-simplifier .................................... SKIPPED

How can I make all the modules get installed and not only the first one? Thanks a lot!

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/dhfbk/tint/issues/18#issuecomment-465927709, or mute the thread https://github.com/notifications/unsubscribe-auth/ADWtiJV0RQQH2xfZE6HGlIcmfqNXPjWNks5vPmedgaJpZM4OZyzr .

nadezdaalexandrovna commented 5 years ago

Thanks a lot, it worked. Now I have another resource not found problem: resource feat-mappings.txt not found: edu.stanford.nlp.util.MetaClass$ClassCreationException: MetaClass couldn't create public eu.fbk.dh.tint.digimorph.annotator.DigiLemmaAnnotator(java.lang.String,java.util.Properties) with args [ita_lemma, {readability.glossario.use=no, timex.uimaVarLanguage=Language, timex.uimaVarDuration=Duration, timex.chineseTokenizerPath=, ner.useSUTime=0, timex.typeSystemHome_DKPro=desc/type/DKPro_TypeSystem.xml, dbps.first_confidence=0.5, timex.uimaVarTime=Time, timex.uimaVarTypeToProcess=Type, dbps.min_confidence=0.3, customAnnotatorClass.keyphrase=eu.fbk.dh.kd.annotator.DigiKdAnnotator, customAnnotatorClass.fake_dep=eu.fbk.dkm.pikes.depparseannotation.StanfordToConllDepsAnnotator, timex.uimaVarDate=Date, dbps.address=http://spotlight.sztaki.hu:2230/rest, customAnnotatorClass.timex=eu.fbk.dh.tint.heideltime.annotator.HeidelTimeAnnotator, customAnnotatorClass.geoloc=eu.fbk.dh.tint.geoloc.annotator.GeolocAnnotator, timex.uimaVarSet=Set, dbps.annotator=dbpedia-annotate, pos.model=models/italian-fast.tagger, dbps.extract_types=0, customAnnotatorClass.readability=eu.fbk.dh.tint.readability.ReadabilityAnnotator, timex.considerTemponym=false, annotators=ita_toksent, pos, ita_morpho, ita_lemma, ner, timex.considerTime=true, timex.treeTaggerHome=/home/nadiushka/treetagger/cmd, timex.considerDate=true, customAnnotatorClass.ita_tense=eu.fbk.dh.tint.tense.TenseAnnotator, readability.glossario.parse=yes, readability.language=it, depparse.model=models/parser-model-1.txt.gz, customAnnotatorClass.dbps=eu.fbk.dkm.pikes.twm.LinkingAnnotator, customAnnotatorClass.ita_morpho=eu.fbk.dh.tint.digimorph.annotator.DigiMorphAnnotator, timex.typeSystemHome=desc/type/HeidelTime_TypeSystem.xml, timex.uimaVarTemponym=Temponym, readability.glossario.stanford.annotators=ita_toksent, pos, ita_morpho, ita_lemma, customAnnotatorClass.ml=eu.fbk.dkm.pikes.twm.LinkingAnnotator, ner.model=models/ner-ita-nogpe-noiob_gaz_wikipedia_sloppy.ser.gz, customAnnotatorClass.ita_toksent=eu.fbk.dh.tint.tokenizer.annotators.ItalianTokenizerAnnotator, timex.considerDuration=true, timex.considerSet=true, customAnnotatorClass.ita_lemma=eu.fbk.dh.tint.digimorph.annotator.DigiLemmaAnnotator, ita_morpho.model=/home/nadiushka/pepper/CoreNLPPepper/italian.db}] ... Caused by: java.lang.IllegalArgumentException: resource feat-mappings.txt not found. at com.google.common.base.Preconditions.checkArgument(Preconditions.java:146) at com.google.common.io.Resources.getResource(Resources.java:197) at eu.fbk.dh.tint.digimorph.annotator.GuessModel.(GuessModel.java:235) at eu.fbk.dh.tint.digimorph.annotator.GuessModelInstance.(GuessModelInstance.java:18) at eu.fbk.dh.tint.digimorph.annotator.GuessModelInstance.getInstance(GuessModelInstance.java:23) at eu.fbk.dh.tint.digimorph.annotator.DigiLemmaAnnotator.(DigiLemmaAnnotator.java:87)

I saved it on my computer, but to what variable in the default-config.properties file should I assign its path? Thank you!

ziorufus commented 5 years ago

Yes, I guess you can assign che path to the file in the properties file. Best, Alessio

Il giorno gio 21 feb 2019 alle ore 16:39 nadezdaalexandrovna < notifications@github.com> ha scritto:

Thanks a lot, it worked. Now I have another resource not found problem: resource feat-mappings.txt not found: edu.stanford.nlp.util.MetaClass$ClassCreationException: MetaClass couldn't create public eu.fbk.dh.tint.digimorph.annotator.DigiLemmaAnnotator(java.lang.String,java.util.Properties) with args [ita_lemma, {readability.glossario.use=no, timex.uimaVarLanguage=Language, timex.uimaVarDuration=Duration, timex.chineseTokenizerPath=, ner.useSUTime=0, timex.typeSystemHome_DKPro=desc/type/DKPro_TypeSystem.xml, dbps.first_confidence=0.5, timex.uimaVarTime=Time, timex.uimaVarTypeToProcess=Type, dbps.min_confidence=0.3, customAnnotatorClass.keyphrase=eu.fbk.dh.kd.annotator.DigiKdAnnotator, customAnnotatorClass.fake_dep=eu.fbk.dkm.pikes.depparseannotation.StanfordToConllDepsAnnotator, timex.uimaVarDate=Date, dbps.address=http://spotlight.sztaki.hu:2230/rest, customAnnotatorClass.timex=eu.fbk.dh.tint.heideltime.annotator.HeidelTimeAnnotator, customAnnotatorClass.geoloc=eu.fbk.dh.tint.geoloc.annotator.GeolocAnnotator, timex.uimaVarSet=Set, dbps.annotator=dbpedia-annotate, pos.model=models/italian-fast.tagger, dbps.extract_types=0, customAnnotatorClass.readability=eu.fbk.dh.tint.readability.ReadabilityAnnotator, timex.considerTemponym=false, annotators=ita_toksent, pos, ita_morpho, ita_lemma, ner, timex.considerTime=true, timex.treeTaggerHome=/home/nadiushka/treetagger/cmd, timex.considerDate=true, customAnnotatorClass.ita_tense=eu.fbk.dh.tint.tense.TenseAnnotator, readability.glossario.parse=yes, readability.language=it, depparse.model=models/parser-model-1.txt.gz, customAnnotatorClass.dbps=eu.fbk.dkm.pikes.twm.LinkingAnnotator, customAnnotatorClass.ita_morpho=eu.fbk.dh.tint.digimorph.annotator.DigiMorphAnnotator, timex.typeSystemHome=desc/type/HeidelTime_TypeSystem.xml, timex.uimaVarTemponym=Temponym, readability.glossario.stanford.annotators=ita_toksent, pos, ita_morpho, ita_lemma, customAnnotatorClass.ml=eu.fbk.dkm.pikes.twm.LinkingAnnotator, ner.model=models/ner-ita-nogpe-noiob_gaz_wikipedia_sloppy.ser.gz, customAnnotatorClass.ita_toksent=eu.fbk.dh.tint.tokenizer.annotators.ItalianTokenizerAnnotator, timex.considerDuration=true, timex.considerSet=true, customAnnotatorClass.ita_lemma=eu.fbk.dh.tint.digimorph.annotator.DigiLemmaAnnotator, ita_morpho.model=/home/nadiushka/pepper/CoreNLPPepper/italian.db}] ... Caused by: java.lang.IllegalArgumentException: resource feat-mappings.txt not found. at com.google.common.base.Preconditions.checkArgument(Preconditions.java:146) at com.google.common.io.Resources.getResource(Resources.java:197) at eu.fbk.dh.tint.digimorph.annotator.GuessModel.(GuessModel.java:235) at eu.fbk.dh.tint.digimorph.annotator.GuessModelInstance.(GuessModelInstance.java:18) at eu.fbk.dh.tint.digimorph.annotator.GuessModelInstance.getInstance(GuessModelInstance.java:23) at eu.fbk.dh.tint.digimorph.annotator.DigiLemmaAnnotator.(DigiLemmaAnnotator.java:87)

I saved it on my computer, but to what variable in the default-config.properties file should I assign its path? Thank you!

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/dhfbk/tint/issues/18#issuecomment-466046856, or mute the thread https://github.com/notifications/unsubscribe-auth/ADWtiAdo2lU2xRLNNqb0U7Mlp2IEmHEEks5vPr3LgaJpZM4OZyzr .

nadezdaalexandrovna commented 5 years ago

Yes, but what is the name of the variable to assign it to?

ziorufus commented 5 years ago

Try to move that file in the src/main/resources folder of your project. A.

Il giorno ven 22 feb 2019 alle ore 12:55 nadezdaalexandrovna < notifications@github.com> ha scritto:

Yes, but what is the name of the variable to assign it to?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/dhfbk/tint/issues/18#issuecomment-466372359, or mute the thread https://github.com/notifications/unsubscribe-auth/ADWtiFNo8VVOerDG6CUOTW1iAWlTO97Lks5vP9qegaJpZM4OZyzr .

ziorufus commented 5 years ago

You were right, there was no property for the guess model. I've updated the code, you can now use ita_lemma.guess_model and specify the file in the properties file. Just pull the repository on the develop branch.

Best, Alessio

Il giorno ven 22 feb 2019 alle ore 14:40 Alessio Palmero Aprosio < alessio@apnetwork.it> ha scritto:

Try to move that file in the src/main/resources folder of your project. A.

Il giorno ven 22 feb 2019 alle ore 12:55 nadezdaalexandrovna < notifications@github.com> ha scritto:

Yes, but what is the name of the variable to assign it to?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/dhfbk/tint/issues/18#issuecomment-466372359, or mute the thread https://github.com/notifications/unsubscribe-auth/ADWtiFNo8VVOerDG6CUOTW1iAWlTO97Lks5vP9qegaJpZM4OZyzr .

nadezdaalexandrovna commented 5 years ago

Thank you.

nadezdaalexandrovna commented 5 years ago

Good afternoon Alessio, Sorry to disturb you again, but after pulling the new development version I am now getting the following error: Caused by: edu.stanford.nlp.util.MetaClass$ClassCreationException: MetaClass couldn't create public eu.fbk.dh.tint.digimorph.annotator.DigiMorphAnnotator(java.lang.String,java.util.Properties) with args [ita_morpho, {readability.glossario.use=no, timex.uimaVarLanguage=Language, timex.uimaVarDuration=Duration, timex.chineseTokenizerPath=, ner.useSUTime=0, timex.typeSystemHome_DKPro=desc/type/DKPro_TypeSystem.xml, dbps.first_confidence=0.5, timex.uimaVarTime=Time, timex.uimaVarTypeToProcess=Type, ita_lemma.guess_model=/home/nadiushka/pepper/CoreNLPPepper/feat-mappings.txt, dbps.min_confidence=0.3, customAnnotatorClass.keyphrase=eu.fbk.dh.kd.annotator.DigiKdAnnotator, customAnnotatorClass.fake_dep=eu.fbk.dkm.pikes.depparseannotation.StanfordToConllDepsAnnotator, timex.uimaVarDate=Date, dbps.address=http://spotlight.sztaki.hu:2230/rest, customAnnotatorClass.timex=eu.fbk.dh.tint.heideltime.annotator.HeidelTimeAnnotator, customAnnotatorClass.geoloc=eu.fbk.dh.tint.geoloc.annotator.GeolocAnnotator, timex.uimaVarSet=Set, dbps.annotator=dbpedia-annotate, pos.model=models/italian-fast.tagger, dbps.extract_types=0, customAnnotatorClass.readability=eu.fbk.dh.tint.readability.ReadabilityAnnotator, timex.considerTemponym=false, annotators=ita_toksent, pos, ita_morpho, ita_lemma, ner, timex.considerTime=true, timex.treeTaggerHome=/home/nadiushka/treetagger/cmd, customAnnotatorClass.ita_tense=eu.fbk.dh.tint.tense.TenseAnnotator, timex.considerDate=true, readability.glossario.parse=yes, readability.language=it, depparse.model=models/parser-model-1.txt.gz, customAnnotatorClass.dbps=eu.fbk.dkm.pikes.twm.LinkingAnnotator, customAnnotatorClass.ita_morpho=eu.fbk.dh.tint.digimorph.annotator.DigiMorphAnnotator, timex.typeSystemHome=desc/type/HeidelTime_TypeSystem.xml, timex.uimaVarTemponym=Temponym, readability.glossario.stanford.annotators=ita_toksent, pos, ita_morpho, ita_lemma, customAnnotatorClass.ml=eu.fbk.dkm.pikes.twm.LinkingAnnotator, ner.model=models/ner-ita-nogpe-noiob_gaz_wikipedia_sloppy.ser.gz, customAnnotatorClass.ita_toksent=eu.fbk.dh.tint.tokenizer.annotators.ItalianTokenizerAnnotator, timex.considerDuration=true, timex.considerSet=true, customAnnotatorClass.ita_lemma=eu.fbk.dh.tint.digimorph.annotator.DigiLemmaAnnotator, ita_morpho.model=italian.db}] at edu.stanford.nlp.util.MetaClass$ClassFactory.createInstance(MetaClass.java:237) at edu.stanford.nlp.util.MetaClass.createInstance(MetaClass.java:382) at edu.stanford.nlp.pipeline.AnnotatorImplementations.custom(AnnotatorImplementations.java:141) at edu.stanford.nlp.pipeline.StanfordCoreNLP.lambda$null$28(StanfordCoreNLP.java:583) at edu.stanford.nlp.util.Lazy$3.compute(Lazy.java:126) at edu.stanford.nlp.util.Lazy.get(Lazy.java:31) at edu.stanford.nlp.pipeline.AnnotatorPool.get(AnnotatorPool.java:149) at edu.stanford.nlp.pipeline.StanfordCoreNLP.(StanfordCoreNLP.java:251) at edu.stanford.nlp.pipeline.StanfordCoreNLP.(StanfordCoreNLP.java:192) at edu.stanford.nlp.pipeline.StanfordCoreNLP.(StanfordCoreNLP.java:188) at eu.fbk.dh.tint.runner.TintPipeline.load(TintPipeline.java:56) at eu.fbk.dh.tint.runner.TintPipeline.runRaw(TintPipeline.java:116) at eu.fbk.dh.tint.runner.TintPipeline.runRaw(TintPipeline.java:112) at CoreNLPPepper.CoreNLPPepper.CoreNLPManipulator$CoreNLPMapper.testStanfordItalian(CoreNLPManipulator.java:238) at CoreNLPPepper.CoreNLPPepper.CoreNLPManipulator$CoreNLPMapper.mapSDocument(CoreNLPManipulator.java:144) at org.corpus_tools.pepper.impl.PepperMapperControllerImpl.map(PepperMapperControllerImpl.java:251) at org.corpus_tools.pepper.impl.PepperMapperControllerImpl.run(PepperMapperControllerImpl.java:188) Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at edu.stanford.nlp.util.MetaClass$ClassFactory.createInstance(MetaClass.java:233) ... 16 more Caused by: org.mapdb.DBException$VolumeIOError at org.mapdb.volume.MappedFileVolSingle.(MappedFileVolSingle.java:108) at org.mapdb.volume.MappedFileVol$MappedFileFactory.factory(MappedFileVol.java:59) at org.mapdb.volume.MappedFileVol$MappedFileFactory.makeVolume(MappedFileVol.java:38) at org.mapdb.volume.VolumeFactory.makeVolume(VolumeFactory.java:20) at org.mapdb.volume.VolumeFactory.makeVolume(VolumeFactory.java:15) at eu.fbk.dh.tint.digimorph.DigiMorph.(DigiMorph.java:67) at eu.fbk.dh.tint.digimorph.annotator.DigiMorphModel.getInstance(DigiMorphModel.java:14) at eu.fbk.dh.tint.digimorph.annotator.DigiMorphAnnotator.(DigiMorphAnnotator.java:23) ... 21 more Caused by: java.io.FileNotFoundException: italian.db (No such file or directory) at java.io.RandomAccessFile.open0(Native Method) at java.io.RandomAccessFile.open(RandomAccessFile.java:316) at java.io.RandomAccessFile.(RandomAccessFile.java:243) at org.mapdb.volume.MappedFileVolSingle.(MappedFileVolSingle.java:85) ... 28 more

It is similar to the one a had already had with italian.db, but not exactly the same. I have tried saving the italian.db file in different places and tried these 3 configurations: 1 ita_morpho.model=/home/nadiushka/pepper/CoreNLPPepper/italian.db 2 ita_morpho.model=models/italian.db 3 ita_morpho.model=italian.db But none of them has worked. Would you have any suggestions on how to address this problem? Thank you in advance!