The OCR feature is terrific, with one exception: whenever there is a new line, the OCR does not include a space between words on subsequent lines. For example:
tilted at +10-20 degrees.Based on the degree of invagination, CCSs were classified into threecategories.
Can we add a space for words between new lines? I asked GPT4 how to do this, and here's what it suggested:
// Inside the tesseractImage.onload = async () => { ... }
const {
data: { text },
} = await worker.recognize(canvas);
await worker.terminate();
const textBullets = text.split("\n");
const bullets = [];
let currentText = "";
for (let b = 0; b < textBullets.length; b++) {
const s = textBullets[b].trim(); // Trim to remove leading and trailing whitespaces
if (s) {
if (currentText && !currentText.match(/[\.,!?\)\]\:;\-]$/)) {
// Add a space before the new text if the last character is not a punctuation mark that typically does not follow a space
currentText += " ";
}
currentText += s;
} else if (currentText) {
// Push the currentText into bullets when encountering an empty string (newline), and reset currentText
bullets.push(
currentText.startsWith("* ") ||
currentText.startsWith("- ") ||
currentText.startsWith("— ")
? currentText.substring(2)
: currentText
);
currentText = "";
}
}
if (currentText) {
// Ensure any remaining text is also pushed into bullets
bullets.push(
currentText.startsWith("* ") ||
currentText.startsWith("- ") ||
currentText.startsWith("— ")
? currentText.substring(2)
: currentText
);
}
// The rest of your logic to create blocks from bullets remains unchanged.
The OCR feature is terrific, with one exception: whenever there is a new line, the OCR does not include a space between words on subsequent lines. For example:
Can we add a space for words between new lines? I asked GPT4 how to do this, and here's what it suggested: