CentreForDigitalHumanities / tscan

T-scan: an analysis tool for dutch texts to assess the complexity of the text, based on original work by Rogier Kraf
GNU Affero General Public License v3.0
18 stars 6 forks source link

ERROR: cannot load lexicon file #84

Open janwillemb opened 11 months ago

janwillemb commented 11 months ago

docker-compose up leads to a successful build and the container is listening on port 80. But when browsing to port 80: bad gateway.

In the log tscan complains:

tscan-tscan-1  | 11:09:06.40: Ready.
tscan-tscan-1  | 11:09:07.34: Starting wopr 1.43
tscan-tscan-1  | 11:09:07.34: Timbl support built in.
tscan-tscan-1  | 11:09:07.34: Based on timbl 6.9
tscan-tscan-1  | 11:09:07.34: Based on libfolia 2.17
tscan-tscan-1  | 11:09:07.34: ICU support, version 70.1
tscan-tscan-1  | 11:09:07.34: std::numeric_limits<int>::max() = 2147483647
tscan-tscan-1  | 11:09:07.34: std::numeric_limits<long>::max() = 9223372036854775807
tscan-tscan-1  | 11:09:07.34: PID:   2005 PPID:   2004
tscan-tscan-1  | 11:09:07.34: Running: xmlserver
tscan-tscan-1  | 11:09:07.34: xmlserver. Returns a FoLiA document over a sequence.
tscan-tscan-1  | 11:09:07.34:  ibasefile: /usr/local/share/tscan/sonar_newspapercorp_tokenized.3.txt.l2r0_-a4+D.ibase
tscan-tscan-1  | 11:09:07.34:  port:      7020
tscan-tscan-1  | 11:09:07.34:  keep:      1
tscan-tscan-1  | 11:09:07.34:  moses:     0
tscan-tscan-1  | 11:09:07.34:  lb:        1
tscan-tscan-1  | 11:09:07.34:  lc:        2
tscan-tscan-1  | 11:09:07.34:  rc:        0
tscan-tscan-1  | 11:09:07.34:  verbose:   2
tscan-tscan-1  | 11:09:07.34:  timbl:
tscan-tscan-1  | 11:09:07.34:  lexicon    /usr/local/share/tscan/sonar_newspapercorp_tokenized.3.txt.lex
tscan-tscan-1  | 11:09:07.34:  hapax:     0
tscan-tscan-1  | 11:09:07.34:  skip_sm:   false
tscan-tscan-1  | 11:09:07.34: ERROR: cannot load lexicon file.
tscan-tscan-1  | 11:09:07.34: Result = -1
tscan-tscan-1  | 11:09:07.34: Running for 00s

Indeed, there is no lex file in the directory /usr/local/share/tscan/:

arianpasquali commented 11 months ago

Same here. I ran docker-compose up and everything looks fine except it fails to load the lexicon file. Browsing http://localhost:8830 gives me 502 bad gateway as well.

kosloot commented 11 months ago

I noticed:

tscan-tscan-1 | 11:09:07.34: Starting wopr 1.43

So you use the latest version from Git. Unfortunately I performed al lot of code cleaning to it, without a proper testbed. Therefor I didn't release it, (nor planning a release)

I suggest to try to revert to the release 1.42 But that won't fix missing files of course

arianpasquali commented 11 months ago

@kosloot do you recommend any specific branch to build the docker image?

We are using wopr as it is specified in the Dockerfile line 130. It is simply cloning the wopr repo.

I ve also checked previous releases like https://github.com/UUDigitalHumanitieslab/tscan/releases/tag/v0.9.8 but they look deprecated since they do not support docker.

kosloot commented 11 months ago

@kosloot do you recommend any specific branch to build the docker image?

No, I have no say at all in the Tscan releases and/or the the Docker builds. I only wanted to note that the Git version of Wopr might not be the first choice. It is old and flaky software anyway. But switching to an older version will not resolve missing files, they should be provided in the image I assume.