aspose-words / Aspose.Words-for-.NET

Aspose.Words for .NET examples, plugins and showcases
https://products.aspose.com/words/net
MIT License
503 stars 187 forks source link

PageSplitter not working as expected #226

Closed nikola-yankov closed 4 years ago

nikola-yankov commented 5 years ago

With Aspose.Words 19.7 trying to use DocumentPageSplitter from the samples in order to get each page from a specific document as a new document. The code used is:

static void Main(string[] args)
{
    try
    {
        const string fileName = "problematic_file.docx";

        var license = new License();
        license.SetLicense(...);

        ExtractDocumentToPages(fileName, 1, 1);
    }
    finally
    {
        Console.WriteLine("Done!");
        Console.ReadLine();
    }
}

public static void ExtractDocumentToPages(string docName, int fromPage, int pagesCount)
{           
    string folderName = Path.GetDirectoryName(docName);
    string fileName = Path.GetFileNameWithoutExtension(docName);
    string extensionName = Path.GetExtension(docName);
    string outFolder = Path.Combine(folderName, "_out");

    Console.WriteLine("Processing document: " + fileName + extensionName);

    Document doc = new Document(docName);

    // Split nodes in the document into separate pages.
    DocumentPageSplitter splitter = new DocumentPageSplitter(doc);

    Document pageDoc = splitter.GetDocumentOfPageRange(fromPage, fromPage+pagesCount-1); 
    pageDoc.Save(Path.Combine(outFolder, string.Format("{0} -  Out{1}", fileName, extensionName)));
}

The result file contains two pages instead of one. Please note that valid license is applied. The source and the result files are attached to the issue.

problematic_file.docx problematic_file - Out.docx

tahir-manzoor commented 5 years ago

@nikola-yankov

Your document contains the column breaks in the Run nodes. You can remove it after extracting the page from the document using following code snippet.

Document document = splitter.GetDocumentOfPageRange(1, 1);
Paragraph paragraph =  document.LastSection.Body.LastParagraph;
paragraph.Runs[paragraph.Runs.Count-1].Text = paragraph.Runs[paragraph.Runs.Count-1].Text.Replace(ControlChar.ColumnBreak, ""); 
document.Save(MyDir + "output.docx");
nikola-yankov commented 5 years ago

@tahir-manzoor Thank you for the suggestion. I am facing another issue. Assume the following code:

Document document = splitter.GetDocumentOfPageRange(1, 3);
document.Save(MyDir + "output.docx");

Now the output document contains the following pages 1,2,2,3

or

Document document = splitter.GetDocumentOfPageRange(1, 5);
document.Save(MyDir + "output.docx");

The output documents contains 1,2,2,3,3,4,4,5

tahir-manzoor commented 5 years ago

A minimal valid document's body needs to contain at least one Paragraph. Your document contains one paragraph with multiple Run nodes that contain the column breaks.

In your case, we suggest you please use the Aspose.Words to convert DOCX pages to PDF files and then use Aspose.PDF to convert PDF to DOC. Please check the following code example. Hope this helps you.

Aspose.Words code to convert DOCX pages to PDF files.

Document doc = new Document(MyDir + @"input.docx");
int pageCount = doc.PageCount;

PdfSaveOptions opts = new PdfSaveOptions();
opts.PageCount = 1;
opts.UpdateFields = false;
for (int i = 0; i < pageCount; i++)
{
    opts.PageIndex = i;
    doc.Save(MyDir + @"19.7-" + i + ".pdf", opts);
}

Aspose.PDF code to convert PDF to DOC.

// Open the source PDF document
Aspose.Pdf.Document pdfDocument = new Aspose.Pdf.Document(MyDir + "input.pdf");

// Save the file into MS document format
pdfDocument.Save(MyDir + "output.doc", Aspose.Pdf.SaveFormat.Doc);
romeokromeok commented 5 years ago

@nikola-yankov were the replies above helpful? Do you still need help with Aspose.Words on these issues?

nikola-yankov commented 5 years ago

@romeokromeok yes, thank you.

AlexNosk commented 4 years ago

@nikola-yankov Page splitter mechanism is integrated into Aspose.Words 20.10. Now you can use Document.ExtractPages method. https://apireference.aspose.com/words/net/aspose.words/document/methods/extractpages I am closing the issue. Please feel free to report any further issue here or in the forum.

amruthats commented 4 years ago

Hi, I'm using aspose.words 20.10 and I'm trying to fetch the number of pages count for the uploaded document. But It returning very huge and wrong page count.Please help me on this.

private Int32 GetWordPageCount(byte[] docData) { Int32 pages = 1; try { using (MemoryStream dataStream = new MemoryStream(docData)) { StreamReader sr = new StreamReader(dataStream); asw.License lic = new asw.License(); lic.SetLicense("Aspose.Words.lic"); asw.Document doc = new asw.Document();
asw.DocumentBuilder db = new asw.DocumentBuilder(doc); db.Writeln(sr.ReadToEnd()); sr.Dispose(); sr.Close(); pages = db.Document.PageCount;
} }