Closed lfoppiano closed 7 years ago
here the log from the generation of training data for 1 pdf:
18 May 2017 08:12.02 [INFO ] Lexicon - Initiating dictionary 18 May 2017 08:12.02 [INFO ] Lexicon - End of Initialization of dictionary 18 May 2017 08:12.02 [INFO ] Lexicon - Initiating names 18 May 2017 08:12.02 [INFO ] Lexicon - End of initialization of names 18 May 2017 08:12.02 [INFO ] Lexicon - Initiating country codes 18 May 2017 08:12.02 [INFO ] Lexicon - End of initialization of country codes 18 May 2017 08:12.02 [INFO ] WapitiModel - Loading model: /Users/lfoppiano/development/inria/grobid/grobid-home/models/dictionary-body-segmentation/model.wapiti (size: 155377) [Wapiti] Loading model: "/Users/lfoppiano/development/inria/grobid/grobid-home/models/dictionary-body-segmentation/model.wapiti" Model path: /Users/lfoppiano/development/inria/grobid/grobid-home/models/dictionary-body-segmentation/model.wapiti 18 May 2017 08:12.02 [DEBUG] DocumentSource - start pdf2xml 18 May 2017 08:12.02 [DEBUG] DocumentSource - Executing command: [bash, -c, ulimit -Sv 6242304 && /Users/lfoppiano/development/inria/grobid/grobid-home/pdf2xml/mac-64/pdftoxml -blocks -noImageInline -fullFontName -noImage -annotation 'resources/byDictionary/BasicEnglish/corpus/pdf/BasicEnglish30.pdf' /Users/lfoppiano/development/inria/grobid/grobid-home/tmp/KyvxTsun9N.lxml] 18 May 2017 08:12.02 [DEBUG] DocumentSource - Executing: [bash, -c, ulimit -Sv 6242304 && /Users/lfoppiano/development/inria/grobid/grobid-home/pdf2xml/mac-64/pdftoxml -blocks -noImageInline -fullFontName -noImage -annotation 'resources/byDictionary/BasicEnglish/corpus/pdf/BasicEnglish30.pdf' /Users/lfoppiano/development/inria/grobid/grobid-home/tmp/KyvxTsun9N.lxml] 18 May 2017 08:12.03 [DEBUG] DocumentSource - pdf2xml process finished. Time to process:95ms 18 May 2017 08:12.03 [DEBUG] DocumentSource - start pdf2xml 18 May 2017 08:12.03 [DEBUG] DocumentSource - Executing command: [bash, -c, ulimit -Sv 6242304 && /Users/lfoppiano/development/inria/grobid/grobid-home/pdf2xml/mac-64/pdftoxml -blocks -noImageInline -fullFontName -noImage -annotation 'resources/byDictionary/BasicEnglish/corpus/pdf/BasicEnglish30.pdf' /Users/lfoppiano/development/inria/grobid/grobid-home/tmp/t6C8gT7qYA.lxml] 18 May 2017 08:12.03 [DEBUG] DocumentSource - Executing: [bash, -c, ulimit -Sv 6242304 && /Users/lfoppiano/development/inria/grobid/grobid-home/pdf2xml/mac-64/pdftoxml -blocks -noImageInline -fullFontName -noImage -annotation 'resources/byDictionary/BasicEnglish/corpus/pdf/BasicEnglish30.pdf' /Users/lfoppiano/development/inria/grobid/grobid-home/tmp/t6C8gT7qYA.lxml] 18 May 2017 08:12.03 [DEBUG] DocumentSource - pdf2xml process finished. Time to process:47ms 1 files to be processed. 1 files processed in 454 milliseconds Johan:grobid-dictionaries lfoppiano$ ls
I've fixed it, by doing some diet in the segmentation-body and lexical entry parser
here the log from the generation of training data for 1 pdf: