google-code-export / dkpro-core-asl

Automatically exported from code.google.com/p/dkpro-core-asl
0 stars 0 forks source link

TreeTagger - improve documentation for external users #276

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
External DKPro Core users have problems to get the TreeTagger models and 
binaries. 
If I see this correctly, these are not downloaded automatically.

It might be good to add documentation somewhere on how to retrieve the 
TreeTagger models and binaries - maybe as comments in the TreeTagger module? or 
in the dkpro-docbook?

Original issue reported on code.google.com by eckle.kohler on 26 Nov 2013 at 11:30

GoogleCodeExporter commented 9 years ago
The general process is documented here with TreeTagger as an example:

http://code.google.com/p/dkpro-core-asl/wiki/PackagingResources

Although, there might have been changes/extensions to the process which are not 
yet reflected in the documentation.

Original comment by richard.eckart on 26 Nov 2013 at 11:34

GoogleCodeExporter commented 9 years ago
>> Although, there might have been changes/extensions to the process which are 
not yet reflected in the documentation.

Right, there are several problems: (I report problems a student faced)
- there is no build.xml in the current release

with an older version of the build.xml, the following problems occurred:
- MD5 checksums were not up-to-date
- the Greek language model was not found
- the license check failed

Original comment by eckle.kohler on 26 Nov 2013 at 11:45

GoogleCodeExporter commented 9 years ago
The file is present in the latest release:

http://code.google.com/p/dkpro-core-asl/source/browse/de.tudarmstadt.ukp.dkpro.c
ore-asl/tags/de.tudarmstadt.ukp.dkpro.core-asl-1.5.0/de.tudarmstadt.ukp.dkpro.co
re.treetagger-asl/src/scripts/build.xml

MD5 checksums get out of date, that is the whole point of having them. In 
particular TreeTagger models are not versioned. Without those checksums, we 
have no way of knowing if/when a model changed. It may be possible to do a 
better script which reads the timestamp of the remote model file and uses that, 
but to be honest, I trust more in checksums.

Sometimes models get added/removed by Helmut Schmid without further notice. The 
only thing we can do is update the build.xml when we notice.

Which license check?

Original comment by richard.eckart on 26 Nov 2013 at 11:52

GoogleCodeExporter commented 9 years ago
ok, I see - so actually, everything is described correctly in the Wiki.

Still, the documentation could maybe be improved a bit, I think, e.g.

http://code.google.com/p/dkpro-core-asl/wiki/PackagingResources says:
"...in the scripts directory of the de.tudarmstadt.ukp.dkpro.core.treetagger 
module" 

and this does not refer to a release in the tags folder - obviously the user 
looked at the trunk

Original comment by eckle.kohler on 26 Nov 2013 at 12:48

GoogleCodeExporter commented 9 years ago
Btw. it is possible to use the TreeTagger component without using these JARs by 
setting these parameters:

PARAM_EXECUTABLE_PATH
PARAM_MODEL_PATH
PARAM_MODEL_ENCODING

Original comment by richard.eckart on 26 Nov 2013 at 1:18

GoogleCodeExporter commented 9 years ago
There is a section as well specifically on where to find the right build.xml 
file. It mentions to look in the tag folder.

---

Which build.xml file to use?
For any given module supporting packaged resources, there is always the 
build.xmlin SVN trunk and the ones in previous releases (tags) in SVN. Which 
one should you use?

For TreeTagger, you should always use the version from SVN trunk. Here, it is 
least likely that the MD5 checksums are outdated and you will always get the 
latest and greatest version of TreeTagger.

For all other modules (e.g. OpenNLP or StanfordNLP) you should use the 
build.xml for your DKPro Core version. Thus, if you are working with the latest 
DKPro Core SNAPSHOT, use the one from SVN trunk and if you use DKPro Core 
1.3.0, then look in the 1.3.0 tag in SVN.

We do not ship the build.xml files in any other way than via SVN.

Original comment by richard.eckart on 26 Nov 2013 at 1:20

GoogleCodeExporter commented 9 years ago
right, but the section says:

"there is always the build.xml in SVN trunk" 

which is not the case currently

Original comment by eckle.kohler on 26 Nov 2013 at 1:26

GoogleCodeExporter commented 9 years ago
Of course it is there:

https://dkpro-core-asl.googlecode.com/svn/de.tudarmstadt.ukp.dkpro.core-asl/trun
k/de.tudarmstadt.ukp.dkpro.core.treetagger-asl/src/scripts/build.xml

Original comment by richard.eckart on 26 Nov 2013 at 1:29

GoogleCodeExporter commented 9 years ago
sorry! - I overlooked this - 

so do you think the documentation is fine as it is?
otherwise I am volunteering to improve it

Original comment by eckle.kohler on 26 Nov 2013 at 1:34

GoogleCodeExporter commented 9 years ago
Well, I suppose we determined that, while all information is there, it seems 
not to be presented in the most informative manner. Please feel free to improve 
it :)

Original comment by richard.eckart on 26 Nov 2013 at 1:42

GoogleCodeExporter commented 9 years ago

Original comment by eckle.kohler on 26 Nov 2013 at 7:39