desmondmorris / node-tesseract

A simple wrapper for the Tesseract OCR package
Other
676 stars 118 forks source link

stdout maxBuffer exceeded with german language #41

Open sdkcarlos opened 7 years ago

sdkcarlos commented 7 years ago

An error occured: { Error: stderr maxBuffer exceeded at Socket. (child_process.js:278:14) at emitOne (events.js:96:13) at Socket.emit (events.js:188:7) at readableAddChunk (_stream_readable.js:176:18) at Socket.Readable.push (_stream_readable.js:134:10) at Pipe.onread (net.js:548:20) cmd: 'tesseract german.jpeg C:\Users\sdkca\AppData\Local\Temp\node-tesseract-9dc7b457-d0c6-4fd8-858a-6f9af0cf38c6 -l deu -psm 6' }

I'm getting the previous exception with the following image:

German Characters

With the following code:

var tesseract = require('node-tesseract');

var options = {
    // Use the english and german languages (it crashes to with deu only)
    l: 'eng+deu',
    // Use the segmentation mode #6 that assumes a single uniform block of text.
    psm: 6
};

tesseract.process('german.jpeg', options , (err, text) => {
    if(err){
        return console.log("An error occured: ", err);
    }

    console.log("Recognized text:");
    console.log(text);
});

Fixed

Update: after analyzing your code, i fix it by increasing the max ammount of data allowed in stdout. It may be useful to provide this information in the readme in case someone has the same issue :) :

var tesseract = require('node-tesseract');

var options = {
    // Use the english and german languages
    l: 'deu',
    // Use the segmentation mode #6 that assumes a single uniform block of text.
    psm: 6,
    // Increase the allowed amount of data in stdout
    env: {
        maxBuffer: 4096 * 4096
    }
};

tesseract.process('german.jpeg', options , (err, text) => {
    if(err){
        return console.log("An error occured: ", err);
    }

    console.log("Recognized text:");
    console.log(text);
});
OoDeLally commented 7 years ago

Thanks! I had the same problem.