Noahs-ARK / semafor

http://www.ark.cs.cmu.edu/SEMAFOR
GNU General Public License v3.0
95 stars 47 forks source link

NumberFormatException at Alphabet creation step during training #22

Open bl4ck3lk opened 7 years ago

bl4ck3lk commented 7 years ago

A NumberFormatException occurs at step 1 of the Alphabet creation script (training/trainIdModel.sh) when it tries to process some of the lines of cv.train.sentences.frame.elements. The problem is the that numbers separated by : are found instead of expected simple integers.

Example line: 4 Economy economy.n 7 economy 1 Political_region 3:4 Descriptor 5:6 Economy 7

https://github.com/Noahs-ARK/semafor/blob/master/src/main/java/edu/cmu/cs/lti/ark/fn/identification/training/AlphabetCreationThreaded.java#L199

Exception in thread "main" java.util.concurrent.ExecutionException: java.lang.NumberFormatException: For input string: "3:4"
    at java.util.concurrent.FutureTask.report(FutureTask.java:122)
    at java.util.concurrent.FutureTask.get(FutureTask.java:192)
    at edu.cmu.cs.lti.ark.fn.identification.training.AlphabetCreationThreaded.createAlphabet(AlphabetCreationThreaded.java:163)
    at edu.cmu.cs.lti.ark.fn.identification.training.AlphabetCreationThreaded.main(AlphabetCreationThreaded.java:106)
Caused by: java.lang.NumberFormatException: For input string: "3:4"
    at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
    at java.lang.Integer.parseInt(Integer.java:580)
    at java.lang.Integer.parseInt(Integer.java:615)
    at edu.cmu.cs.lti.ark.fn.identification.training.AlphabetCreationThreaded.processLine(AlphabetCreationThreaded.java:201)
    at edu.cmu.cs.lti.ark.fn.identification.training.AlphabetCreationThreaded.access$100(AlphabetCreationThreaded.java:51)
    at edu.cmu.cs.lti.ark.fn.identification.training.AlphabetCreationThreaded$1.call(AlphabetCreationThreaded.java:179)
    at edu.cmu.cs.lti.ark.fn.identification.training.AlphabetCreationThreaded$1.call(AlphabetCreationThreaded.java:175)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)

What does the 3:4 format mean and how should one proceed in order to work around this problem?