I am using the library for a while now. However, today I noticed that if I save the content on the web as PDF using Microsoft PDF driver (that is, printing to PDF) then the code is unable to retrieve the text.
Here is one of such examples that I print to PDF:
https://healingthebody.ca/4-natural-proven-cancer-remedies/
and here is the code:
`using (PdfDocument document = PdfDocument.Open(fileStream))
{
PdfDocInfo pdfDocInfo = new PdfDocInfo()
{
DocFilePath = fileName,
TotalPages = document.NumberOfPages,
Version = document.Version,
Title = document.Information.Title,
Subject = document.Information.Subject,
Author = document.Information.Author,
DateCreated = dateCreated,
DateModified = dateModified,
};
string docText = "";
string pattern = @"(?<=['""A-Za-z0-9][\.\!\?])\s+(?=[A-Z])";
foreach (Page page in document.GetPages())
{
docText += ContentOrderTextExtractor.GetText(page, true);
}
// At this point docText is empty because each page delivers empty string through this GetText API`
}
I am using the library for a while now. However, today I noticed that if I save the content on the web as PDF using Microsoft PDF driver (that is, printing to PDF) then the code is unable to retrieve the text. Here is one of such examples that I print to PDF: https://healingthebody.ca/4-natural-proven-cancer-remedies/
and here is the code:
Any remedy for this?