Closed GoogleCodeExporter closed 9 years ago
If you explicitly specify a model using PARAM_MODEL_LOCATION or
PARAM_MODEL_PATH (depending on the DKPro Core version), you should specify a
mapping file using PARAM_TAGGER_MAPPING_LOCATION.
Maybe there is already a suitable mapping file included with DKPro Core that
you could use. Are you using a standard tagset, if so, which one?
Original comment by richard.eckart
on 4 Jun 2014 at 9:47
I was unable to give the model explicitely via a variable (it's a project which
is called from another project...). Therefore, I copied all the built
TreeTagger files into my resources folder.
However, I don't think that I have a mapping file (neither included in DKPro
Core nor in the built TreeTagger files). Where could I get one and how would I
have to specify it in my sourcecode/project properties? I want to get the POS
tags.
Original comment by Steinert...@googlemail.com
on 5 Jun 2014 at 1:13
I need to better understand what you did and what you are trying to do.
If you just want to access the pos tags that the treetagger produces, that is
easy. To get the POS tag of a token, you can do this:
token.getPos().getPosValue()
If this is all you want, you can stop here.
A mapping is only required if you want to use the coarse-grained POS types that
you could use in a statement like
JCasUtil.select(jcas, N.class)
Apparently you already managed to train a model and to instruct the DKPro Core
treetagger component to use it. To help you further, I would need to know how
you configured the TreeTagger component to use your model. E.g. what exactly
are you referring to when you say that you used a self-built model as described
on our homepage, and I would need to know how you configure and invoke the
TreeTaggerPosLemmaTT4J component.
Original comment by richard.eckart
on 5 Jun 2014 at 8:23
Okay, I want to use a keyphrase extractor from DKPro Keyphrases, e.g. the
PositionBaseline Extractor. These, however, need the POS tags to filter the
tokens.
Here's an example code snippet:
Candidate nounTokens = new Candidate(CandidateType.Token, PosType.N);
KeyphraseExtractor_ImplBase positionBaselineExtractor = new
PositionBaselineExtractor();
positionBaselineExtractor.setCandidate(nounTokens);
AnalysisEngine extractor = positionBaselineExtractor.getKeyphraseEngine();
JCas jcas = extractor.newJCas();
jcas.setDocumentText(text);
extractor.process(jcas);
JCasUtil.select(jcas, Keyphrase.class);
This code uses the TreeTagger. In theory it should only return nouns as
keyphrases, however I receive all words of the input text regardless of POS tag.
Therefore, I checked what POS tags the jcas holds with:
for (POS pos : JCasUtil.select(jcas, POS.class)) {
System.out.println(pos);
}
And this gives me only 'POS' as tags. Hence, it does not know any subtypes,
such as 'NN' or 'N'. If I change the code of DKPro Core's class
'KeyphraseExtractor_ImplBase's createTagger method to use a OpenNlpPosTagger
instead, the PositionBaseline extractor works the way it should.
I built the TreeTagger as described here:
http://code.google.com/p/dkpro-core-asl/wiki/PackagingResources
Original comment by Steinert...@googlemail.com
on 6 Jun 2014 at 8:13
Info about DKPro Keyphrases
In KeyphraseExtractor_ImplBase the TreeTagger is invoked like that:
return createEngineDescription(
TreeTaggerChunkerTT4J.class,
TreeTaggerChunkerTT4J.PARAM_LANGUAGE, getLanguage()
);
So the model is not explicitly added, but loaded via the CAS language.
Original comment by torsten....@gmail.com
on 6 Jun 2014 at 8:30
You said that you are using a self-trained treetagger model. Is it correct that
you extended the build.xml file to package your own model as a jar? (cf. [1])
[1]
https://code.google.com/p/dkpro-core-asl/wiki/ResourceProviderAPI#Packaging_reso
urces_as_JARs
Original comment by richard.eckart
on 6 Jun 2014 at 10:09
No, that is a misunderstanding. I used the build.xml as given by the project. I
do not use any self-trained model. All I did was packaging the TreeTagger with
the build.xml and copying it into the resource folder of my own project.
Original comment by Steinert...@googlemail.com
on 10 Jun 2014 at 8:24
The build.xml file creates various JARs in the folder called "target". Instead
of copying these JARs to your resources folder, add them to you classpath. In
Eclipse you can do this e.g. by right-clicking on them and select "Build path
-> Add to build path". Alternatively, they can be added via Maven.
If you are using DKPro Core 1.5.x, there should be one JAR per language. If you
add that, it should also give you the pos tag mapping.
If you are using DKPro Core 1.6.x, there should be two JARs per model, one
"model" JAR and one "upstream" JAR. Make sure to have added both to the
classpath in order to get the mapping.
If this does not help:
There might not a mapping for all languages and all models. Which language are
you processing?
Original comment by richard.eckart
on 10 Jun 2014 at 9:05
I now added the treetagger-bin jar as well as the treetagger-model-en jar to my
build path. However, the problem remains. I am processing english texts.
Original comment by Steinert...@googlemail.com
on 10 Jun 2014 at 1:18
Same problem here. When i try to get the coarse-grained POS tags with
TreeTagger, i only get "POS" instead of "N" or "V". I am trying to print the
tags as the following:
for (Token tokenAnno : JCasUtil.select(jcas, Token.class)) {
System.out.println(tokenAnno.getPos().getClass().getSimpleName());
The whole thing works for german language, but does not work for english
language. It does not even load the tagsets. The only way to get coarse-grained
tags for english language is mapping TreeTagger to a tagset (e.g.
"en-pos.map").
Maybe you, Steinert can test it for german language? If this works, then maybe
there are problems with english texts and TreeTagger..
Original comment by onurs3...@googlemail.com
on 10 Jun 2014 at 6:42
I'll look into it. Can you please tell me which version of DKPro Core you are
using (mind that -all- DKPro Core JARs in your projects should have the same
version - you should not mix versions) and what are the full JAR names
(including the version) of the model files that you are using. If you know, it
might also be helpful to know the URL/svn revision of the build.xml files that
you used.
Original comment by richard.eckart
on 10 Jun 2014 at 8:54
Okay, here come the SVN revisions I'm using:
build.xml: 25
de.tudarmstadt.ukp.dkpro.core.treetagger-asl: 2281
The versions I use in my POM: de.tudarmstadt.ukp.dkpro.core.treetagger-asl:
1.5.0 (although I checked the project out locally to build it, I still have the
POM set to use a version via a repository).
The names of the jars:
treetagger-bin-20131118.0.jar
treetagger-model-en-20111109.1.jar
Original comment by Steinert...@googlemail.com
on 11 Jun 2014 at 7:56
Update: It's also working for me when using German texts.
Original comment by Steinert...@googlemail.com
on 11 Jun 2014 at 8:18
treetagger-model-en-20111109.1.jar is a model for DKPro Core 1.6.0 [1]. It
declares the tagset "ptb-tt" which is not known to DKPro Core 1.5.0. Hence,
DKPro Core 1.5.0 falls back to mapping every tag to the POS annotation type
(and storing the actual pos-tag only in the posValue feature of the POS
annotation.
You should use the build.xml file for DKPro Core 1.5.0 [2] which declares the
"ptb" tagset.
No matter what build.xm file you use, you might find that some models have
meanwhile been updated on the TreeTagger homepage and some md5 hashes may no
longer match that build.xml file. Since TreeTagger models and binaries cannot
be redistributed due to license restrictions, this should not cause problems.
If you care about versioning and want to stay clear of potential version
conflicts with future build.xml files, I would recommend you add some suffix to
the version, e.g. "20111109.0-steinert".
TreeTagger model packaging will change in DKPro Core 1.7.0 and then follow the
packaging conventions also used for other models/resources.
I assume this should resolve your problem. I am marking this issue as "invalid"
because it does not require changes or further actions on our part. If your
problems is not resolved or if you feel that further action on our part is
necessary, feel free to comment and reopen the issue.
[1]
https://dkpro-core-asl.googlecode.com/svn/de.tudarmstadt.ukp.dkpro.core-asl/tags
/de.tudarmstadt.ukp.dkpro.core-asl-1.6.0/de.tudarmstadt.ukp.dkpro.core.treetagge
r-asl/src/scripts/build.xml
[2]
https://dkpro-core-asl.googlecode.com/svn/de.tudarmstadt.ukp.dkpro.core-asl/tags
/de.tudarmstadt.ukp.dkpro.core-asl-1.5.0/de.tudarmstadt.ukp.dkpro.core.treetagge
r-asl/src/scripts/build.xml
Original comment by richard.eckart
on 11 Jun 2014 at 7:28
Original issue reported on code.google.com by
Steinert...@googlemail.com
on 4 Jun 2014 at 9:38