MedKhem / grobid-dictionaries

31 stars 7 forks source link

java.lang.NullPointerException when running createTrainingDictionarySegmentation #17

Closed novacellus closed 6 years ago

novacellus commented 6 years ago

Running the command:

java -jar target/grobid-dictionaries-0.4.3-SNAPSHOT.one-jar.jar -verbose -gH ../grobid-home/ -gP ../grobid-home/config/grobid.properties -dIn training/lexica/in/ -dOut training/lexica/out/ -exe createTrainingDictionarySegmentation

yields the java.lang.NullPointerException error:

Caused by: java.lang.NullPointerException at org.grobid.core.engines.DictionarySegmentationParser.copyFileUsingStream(DictionarySegmentationParser.java:1525) at org.grobid.core.engines.DictionarySegmentationParser.createTrainingDictionary(DictionarySegmentationParser.java:907) at org.grobid.core.engines.DictionarySegmentationParser.createTrainingBatch(DictionarySegmentationParser.java:825) ... 7 more

From what I was able to understand the engine (DictionarySegmentationParser.java:907) expects the resources/templates/dictionarySegmentation.rng to exist after (or before) the pdf2xml is done:

File existingRngFile = new File("resources/templates/dictionarySegmentation.rng"); File newRngFile = new File(outputDirectory + "/" +"dictionarySegmentation.rng"); copyFileUsingStream(existingRngFile,newRngFile);.

The full stack trace is:

10 lis 2017 21:27.08 [DEBUG] DocumentSource - pdf2xml process finished. Time to process:428ms Exception in thread "main" java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at com.simontuffs.onejar.Boot.run(Boot.java:340) at com.simontuffs.onejar.Boot.main(Boot.java:166) Caused by: org.grobid.core.exceptions.GrobidException: [GENERAL] An exception occured while running Grobid training data generation for segmentation model. at org.grobid.core.engines.DictionarySegmentationParser.createTrainingBatch(DictionarySegmentationParser.java:840) at org.grobid.core.main.batch.DictionaryMain.main(DictionaryMain.java:202) ... 6 more Caused by: java.lang.NullPointerException at org.grobid.core.engines.DictionarySegmentationParser.copyFileUsingStream(DictionarySegmentationParser.java:1525) at org.grobid.core.engines.DictionarySegmentationParser.createTrainingDictionary(DictionarySegmentationParser.java:907) at org.grobid.core.engines.DictionarySegmentationParser.createTrainingBatch(DictionarySegmentationParser.java:825) ... 7 more

MedKhem commented 6 years ago

Dear @novacellus, it's about missing files which we are still testing to enable visual annotation through Oxygen. You could try this new annotation mode through the author mode in Oxygen.

N.B: This is a new feature, so there might be some inconsistencies. But we would be happy to give us your feedback after trying it.

Cheers

novacellus commented 6 years ago

Your commit resolved the issue, indeed. Thank you!