DrDub / ttc-project

Automatically exported from code.google.com/p/ttc-project
0 stars 0 forks source link

Result not generated by indexer #12

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?
1.Launch ttc-term-suite-1.2.jar
2.Run spotter on a corpus with some big txt files (>400kB), provides big 
xmi(30MB)
3.Run indexer on spotter result

What is the expected output? What do you see instead?
expected : tbx and xmi terminology files
result : random failure, generally nothing when 100% done. 

What version of the product are you using? On what operating system?
ttc-term-suite-1.2.jar;tested on Windows XP SP3, Windows 7 32bits, Windows 7 64 
bits.

Please provide any additional information below.
A lot of test done. Result are generated very seldom for large corpora, or when 
corpus contains big txt files (>100kB).
No error message, never.

Unable to evaluate ttc-term-suite on large corpora.

Original issue reported on code.google.com by claude.m...@gmail.com on 12 Apr 2012 at 5:35

GoogleCodeExporter commented 9 years ago
This major problem can be reproduced on the sample, as explained below :
Spotter execution create A-n.xmi file. In A-9.xmi only, the "lastsegment" 
attribute of the "examples:SourceDocumentInformation" tag is set to "true". It 
is "false" in all other files. Changing manually the attribute to "false" in 
A-9.xmi prevents TTCTermsuite indexer to generate tbx, this action reproduce 
the problem.

Indexer generates tbx if one of the xmi file contains lastsegment="true" but 
size of the result depend on the file.
Indexer hangs if all the files contain lastsegment="true".

For large corpora, issue seems to  be a consequence of a spotter defect, which 
does not generates all the xmi files correctly.

This trouble shooting is highly time consuming, and it is not the normal 
purpose of TTC evaluation! 

Original comment by claude.m...@gmail.com on 25 Apr 2012 at 8:40