gitbrent / PptxGenJS

Create PowerPoint presentations with a powerful, concise JavaScript API.
https://gitbrent.github.io/PptxGenJS/
MIT License
2.84k stars 625 forks source link

Fixed issue where text arrays that appear on the last line of an autopaged table were being split up causing repair error #1237

Open mikemeerschaert opened 1 year ago

mikemeerschaert commented 1 year ago

@gitbrent I've encountered another error in the parseTextToLines() function.

When you have an auto paged table, and you have text that is an array for some reason other than to add the breakline option (e.g. to add superscript) and that cell is in the last row on an autopaged table on a slide, it gets broken up into multiple table rows, the superscript goes into it's own row on the new slide, and powerpoint complains that the presentation needs repairs.

I have a simple example repository to preproduce this issue here: https://github.com/mikemeerschaert/pptxgen-autopage-text-array-in-last-row-error

I examined the code in the parseTextToLines() function and I think I spotted the problem

// STEP 1: Ensure inputCells is an array of TableCells
if (cell.text && cell.text.toString().trim().length === 0) {
  // Allow a single space/whitespace as cell text (user-requested feature)
  inputCells.push({ _type: SLIDE_OBJECT_TYPES.tablecell, text: ' ' })
} else if (typeof cell.text === 'number' || typeof cell.text === 'string') {
  inputCells.push({ _type: SLIDE_OBJECT_TYPES.tablecell, text: (cell.text || '').toString().trim() })
} else if (Array.isArray(cell.text)) {
  inputCells = cell.text
}
if (verbose) {
  console.log('[1/4] inputCells')
  inputCells.forEach((cell, idx) => console.log(`[1/4] [${idx + 1}] cell: ${JSON.stringify(cell)}`))
  // console.log('...............................................\n\n')
}

// STEP 2: Group table cells into lines based on "\n" or `breakLine` prop
/**
 * - EX: `[{ text:"Input Output" }, { text:"Extra" }]`                       == 1 line
 * - EX: `[{ text:"Input" }, { text:"Output", options:{ breakLine:true } }]` == 1 line
 * - EX: `[{ text:"Input\nOutput" }]`                                        == 2 lines
 * - EX: `[{ text:"Input", options:{ breakLine:true } }, { text:"Output" }]` == 2 lines
 */
let newLine: TableCell[] = []
inputCells.forEach(cell => {
  // (this is always true, we just constructed them above, but we need to tell typescript b/c type is still string||Cell[])
  if (typeof cell.text === 'string') {
    if (cell.text.split('\n').length > 1) {
      cell.text.split('\n').forEach(textLine => {
        newLine.push({
          _type: SLIDE_OBJECT_TYPES.tablecell,
          text: textLine,
          options: { ...cell.options, ...{ breakLine: true } },
        })
      })
    } else {
      newLine.push({
        _type: SLIDE_OBJECT_TYPES.tablecell,
        text: cell.text.trim(),
        options: cell.options,
      })
    }

    if (cell.options?.breakLine) {
      if (verbose) console.log(`inputCells: new line > ${JSON.stringify(newLine)}`)
      inputLines1.push(newLine)
      newLine = []
    }
  }

  // Flush buffer
  if (newLine.length > 0) {
    inputLines1.push(newLine)
    newLine = []
  }
})
if (verbose) {
  console.log(`[2/4] inputLines1 (${inputLines1.length})`)
  inputLines1.forEach((line, idx) => console.log(`[2/4] [${idx + 1}] line: ${JSON.stringify(line)}`))
  // console.log('...............................................\n\n')
}

// STEP 3: Tokenize every text object into words (then it's really easy to assemble lines below without having to break text, add its `options`, etc.)
inputLines1.forEach(line => {
  line.forEach(cell => {
    const lineCells: TableCell[] = []
    const cellTextStr = String(cell.text) // force convert to string (compiled JS is better with this than a cast)
    const lineWords = cellTextStr.split(' ')

    lineWords.forEach((word, idx) => {
      const cellProps = { ...cell.options }
      // IMPORTANT: Handle `breakLine` prop - we cannot apply to each word - only apply to very last word!
      if (cellProps?.breakLine) cellProps.breakLine = idx + 1 === lineWords.length
      lineCells.push({ _type: SLIDE_OBJECT_TYPES.tablecell, text: word + (idx + 1 < lineWords.length ? ' ' : ''), options: cellProps })
    })

    inputLines2.push(lineCells)
  })
})

Steps:

  1. If the text is an array you assign it to inputCells
  2. You iterate through each array item and add them separately to the inputLines1 array
  3. You then iterate through the inputLines1 array ant tokenize each line of text into the lineCells array, and append that to inputLines2 array.

The issue is in step 3 - If you include a text object that looks like this:

{
  text: [
    { text: "Superscript issue" },
    { text: "1", options: { superscript: true } },
  ],
}

each item in the text array ends up getting treated as a separate tokenized array in inputLines2, which then gets treated as separate table rows later. I think you have some logic to reconcile this somewhere because the issue only crops up when the rows are split between slides.

I think the correct way to handle this is to ensure all the items that were in the text array go into the same tokenized lineCells array, unless they have the breakLine option, or a newline character in which case you add the breakLine option in step 2 on line 86.

This PR refactors step 3 by moving the lineCells declaration outside of the processing of inputLines1, it still checks if the last item in lineCells has options.breakLine = true, and if so appends it to inputLines2 and flushes the buffer (to preserve the existing functionality), then after iterating over all the items in inputLines1 it appends the remaining lineCells to inputLines2, thus reconciling the items from the text array that were separated in step 2.