diegomura / react-pdf

📄 Create PDF files using React
https://react-pdf.org
MIT License
14.76k stars 1.17k forks source link

Table of Contents #479

Open skjo0c opened 5 years ago

skjo0c commented 5 years ago

Is your feature request related to a problem? Please describe. I have a large set of data that will generate a pdf so I am looking for a way to generate TOC automatically.

There was similar kind of functionality in pdfmake that uses pdfkit as well. I was looking for similar kind of thing with react-pdf.

dsvictor94 commented 5 years ago

Hi, What did you mean by "generate TOC automatically"?

Assuming that you are asking by some kind of component that automatic generate a page with the ToC, I think the react-pdf is a low level lib to create pdf in a declarativel way and this kind of component can be create using the build blocks already provided. As a prove of concept see the project attached (read App.js and toc,js)

Would be wonderful if the community developed something like twitter bootstrap for this library, with commonly used components. But unfortunately I do not have time to start a project like these :(

react-pd-test-toc.zip

vstirbu commented 5 years ago

This approach does not handle the page numbers in the ToC. You'd still need a third pass to get those as the page numbers cannot be detected before the second finishes, unless you want to have the ToC at the end of the produced PDF.

Latex is doing three renderings to get numbering correctly.

diegomura commented 5 years ago

Thanks for that info @vstirbu ! Could you refer me to where I can see in more detail how latex does this?

dsvictor94 commented 5 years ago

I totally forgot about the page numbering :disappointed:

I was thinking about it and came with the ideia of a "document event" that is emitted after layout but before the painting, and allow to any listener of this event call setState reacting to layout variables (like the page number and the bounding rect).

this can depreciate the dynamic render (aka render prop) and allow much more complex features. But I don't think this is a easy thing to implement, because it will envolve:

There is some performance implication too, but I think is impossible to beat the numbering problem and others (e.g. ensure chapter start on even pages) without a performance penalty. And this strategy will move the performance considerations to the user (e.g. avoiding change things that cause re-layout) when they consider this a problem.

vstirbu commented 5 years ago

Yes, there would definitely be a performance penalty for this case but not all pdf documents have this kind of formatting/style requirements.

If the step is opt in, the users would be aware of what it implies performance wise and can make an informed decision. I would assume that documents containing ToCs are generated in a batch fashion so the doc does not have to be available immediately.

If it makes things better, the last re-rendering applies only to Table of Contents/Images/Tables/etc. components, while the rest remain unchanged and might even have the layout cached...

@diegomura The latex tooling hides quite well the document generation process these days but the problem of getting correctly the cross references still exists. There is a brief explanation in the thread

bharristn commented 4 years ago

@diegomura Has there been any updates on this? Is it not possible right now to produce the page numbers in a TOC?

avneet2112 commented 2 years ago

Hii @diegomura , Is there any update?

fschucht commented 2 years ago

@diegomura Is there any update on support for table of contents?

I tried the approach from this comment but couldn't get it to work with page numbers.

I'm currently trying this approach and it collects the data including page numbers correctly. However, the component is not being rendered after the state is updated and therefore the page just stays blank. Do you have pointers on how I could change the script to render out the component after the table of contents data has been collected?

import { join as joinPath } from 'path'
import { useState } from 'react'
import { Document, Page, renderToFile, Text } from '@react-pdf/renderer'

export const Pdf = ({ chapters }: { chapters: string[] }): JSX.Element => {
  const [tableOfContentsChapters, setTableOfContentsChapters] = useState<{ title: string; pageNumber: number }[]>([])
  const tmpTableOfContentsChapters: { title: string; pageNumber: number }[] = []

  function setTableOfContentsChapter(chapter: { title: string; pageNumber: number }, isLastChapter: boolean): void {
    if (!tmpTableOfContentsChapters.some(({ title }) => title === chapter.title)) {
      tmpTableOfContentsChapters.push(chapter)
    }

    if (isLastChapter) {
      setTableOfContentsChapters(tmpTableOfContentsChapters)
    }
  }

  // The table of contents data is collected correctly on the second rerender.
  // However, it is not being rendered in the final pdf.
  console.log(tableOfContentsChapters)

  return (
    <Document>
      <Page>
        {tableOfContentsChapters.map((chapter) => (
          <Text>
            {chapter.title} - {chapter.pageNumber}
          </Text>
        ))}
      </Page>
      {chapters.map((chapter, index) => (
        <Page key={index}>
          <Text
            style={{
              fontSize: 11,
            }}
            render={({ pageNumber }) => {
              setTableOfContentsChapter(
                {
                  title: chapter,
                  pageNumber: pageNumber,
                },
                index === chapters.length - 1,
              )

              return chapter
            }}
            fixed
          />
        </Page>
      ))}
    </Document>
  )
}

async function generateEbook(): Promise<void> {
  const path = joinPath(__dirname, '..', 'ebooks', `test.pdf`)
  await renderToFile(<Pdf chapters={['chapter 1', 'chapter 2', 'chapter 3']} />, path)
}

generateEbook()
mohadib commented 1 year ago

@fschucht did you ever solve this, I am in the exact situation as you where once I have the page numbers the component does not rerender as expected.

fschucht commented 1 year ago

@mohadib Yes, I managed to work around the issue by doing something like this:

const tableOfContentsChapters: { title: string; pageNumber: number }[] = []

// We render the ebook twice, first to collect the table of content chapters with page numbers,
// then to render the full ebook with a populated table of contents
await renderToString(<Ebook guide={guide} tableOfContentsChapters={tableOfContentsChapters} />)
await renderToFile(<Ebook guide={guide} tableOfContentsChapters={tableOfContentsChapters} />, filePath)

Then in each first page of a chapter, I added the current chapter to the global tableOfContentsChapters variable:

<ReactPDF.Page>
  <ReactPDF.View
    render={({ pageNumber }) => {
      if (!tableOfContentsChapters.some(({ title }) => title === chapter.title)) {
        tableOfContentsChapters.push({ title: chapter.title, pageNumber: pageNumber })
      }

      return null
    }}
  />
</ReactPDF.Page>

This way, the chapters got populated on the first render, and then were available on the second render.

asgerhallas commented 1 year ago

@fschucht we do the same thing. Do you have a solution for if the ToC ends up larger than a single page?

fschucht commented 1 year ago

@asgerhallas I didn't run into this case myself, so unfortunately I don't have a solution.

mohadib commented 1 year ago

@fschucht works great, thanks!

matbrgz commented 1 year ago

I updated the second commentary solution to React 18. But I got an error.

The toq renders:

transpile.js:122 🚀 ~ ToCProvider ~ toq: [] length: 0 [[Prototype]]: Array(0)
transpile.js:122 🚀 ~ ToCProvider ~ toq: (2) ['TITLE1', 'TITLE2'] length: 2 [[Prototype]]: Array(0)

I'm using

"react": "18.2.0",
"next": "13.4.13",
"react-dom": "18.2.0",
"react-pdf": "^5.3.2",
"raw-loader": "^4.0.2",
"@react-pdf/renderer": "^3.1.12",

My code is:

const ToCContext = createContext({
  toq: [],
  add: () => {},
});

const ToCProvider = ({ children }) => {
  const [toq, setToq] = useState([]);
  console.log("🚀 ~ ToCProvider ~ toq:", toq);

  const add = (title) => {
    if (!toq.includes(title)) {
      setToq((prevToq) => [...prevToq, title]);
    }
  };

  return (
    <ToCContext.Provider value={{ toq, add }}>{children}</ToCContext.Provider>
  );
};

const ToC = () => {
  const { toq } = useContext(ToCContext);

  return (
    <View>
      <Heading2>TABLE OF CONTENT</Heading2>
      <UnorderedList>
        {toq.map((item, index) => (
          <ListItem key={item}>
            {index + 1}. {item}
          </ListItem>
        ))}
      </UnorderedList>
    </View>
  );
};

const InnerHeading = ({ children, ...props }) => {
  const { add, toq } = useContext(ToCContext);
  const index = toq.indexOf(children);

  React.useEffect(() => {
    if (!toq.includes(children)) {
      add(children);
    }
  }, [add, children, toq]);

  return (
    <Text {...props}>
      {index + 1}. {children}
    </Text>
  );
};

const Heading = (props) => <InnerHeading {...props} />;

const Heading1 = ({ children }) => (
  <Heading
    level={1}
  >
    {children}
  </Heading>
);
const Heading2 = ({ children }) => (
  <Heading
    level={2}
  >
    {children}
  </Heading>
);

UnorderedList and ListItem is an abstraction of Text. The component <ToCProvider> is the first component of <Document> and is warping the entire Document.

But with this solution I'm always getting an empty table of content rendering only the title. And all titles are 0. TITLE1, 0. TITLE2 etc. Not rendering correct the title and the <ToC />.

I didn't implement yet the way to correct render levels of <Heading>

image

@diegomura @fschucht @dsvictor94 @skjo0c

maidi29 commented 7 months ago

I came up with a custom solution having a context that stores a table of contents state and the titles updating this state with their page number while rendering.

The page containing the table of contents reads the state and renders the titles and page numbers. Re-generation happens automatically due to state updates.

TableOfContentsContext.tsx

import { createContext, ReactNode, useState } from 'react';

export type TocEntry = { title: string; pageNumber: number; level: number }; // or what other properties you need to render your custom Table of Contents

type TocContextProps = {
  tableOfContents: TocEntry[];
  addToTableOfContents: (entry: TocEntry) => void;
};

export const TableOfContentsContext = createContext<TocContextProps>(
  null as unknown as TocContextProps,
);

export const TableOfContentsProvider = ({ children }: { children: ReactNode }) => {
  const [tableOfContents, setTableOfContents] = useState<TocEntry[]>([]);

  const addToTableOfContents = (entry: TocEntry) => {
    setTableOfContents((prevState) => {
      const entryExists = prevState.some(
        ({ title, pageNumber, level }) =>
          title === entry.title && pageNumber === entry.pageNumber && level === entry.level,
      );
      return entryExists ? prevState : [...prevState, entry];
    });
  };

  return (
    <TableOfContentsContext.Provider value={{ tableOfContents, addToTableOfContents }}>
      {children}
    </TableOfContentsContext.Provider>
  );
};

Usage: mark a title/text as relevant for table of contents:

PageHeader.tsx (or your title component)

const { addToTableOfContents } = useContext(TableOfContentsContext);

<Text
        id={title}
        render={({ pageNumber }) => {
            addToTableOfContents({
                title,
                pageNumber,
                level,
            });
          return title;
        }}
/>

render the table of contents on your wished Page TableOfContentsPage.tsx

 const { tableOfContents } = useContext(TableOfContentsContext);

 {tableOfContents
      .map(({ title, pageNumber, level }, index) => (
              <Link
                key={index}
                src={`#${title}`}
                style={} // custom style also for different levels
              >
                <Text>{title}</Text>
                <Text>{pageNumber}</Text>
              </Link>
         ))
}

Dont forget to wrap your <Document>with <TableOfContentsProvider>

joelybahh commented 6 months ago

The above is great, but doesn't work in a server-side environment, however, server-side is arguably easier because you can just store a dictionary that you pass into the second pass method.

We introduced a parameterised way of passing ?multipass=true to our endpoint, which in our case, just informs the API to do 2 render passes, the first takes in an empty dictionary to fill in, ready for the second pass, which takes in the dictionary and uses it to render the page number.

In typescript, as pseudo-code, it looks something like this:

let stream: NodeJS.ReadableStream;
if (!multipass) {
    stream = await finalRenderPass(
        new Map<string, SchemaPageFooterDetails>() // empty dictionary so page numbers are blank
    );
} else {
    const pageNumbersMap = await firstRenderPass(/** Your Parameters */); // returns page numbers from first pass
    stream = await finalRenderPass(
        /** You other parameters */
        pageNumbersMap
    );
}
return stream;

And the firstPass / finalPass methods something like this:

export type SchemaPageFooterDetails = { pageNumber: number; pageIdentifier: string };

/**
 * Renders the first pass of the PDF, this is necessary to calculate and store the page numbers in a map.
 * @returns A map of page numbers.
 */
export const firstRenderPass = async () => {
    const pageNumbersMap = new Map<string, SchemaPageFooterDetails>();
    await renderToStream(pageNumbersMap);
    return pageNumbersMap;
};

/**
 * Renders the final pass of the PDF, using the page numbers map from the first pass.
 *
 * @returns A stream of the final PDF.
 */
export const finalRenderPass = async (
    pageNumbersMap: Map<string, SchemaPageFooterDetails>
) => {
    return await renderToStream(pageNumbersMap);
};

/**
 * A helper function to render the PDF to a stream.
 *
 * @param pageNumbersMap The page numbers map to use for TOC generation.
 */
export const renderToStream = async (
    pageNumbersMap: Map<string, SchemaPageFooterDetails>
) => {
    return await ReactPDF.renderToStream(
        <PDFDocument
            pageNumbersMap={pageNumbersMap}
        />
    );
};

This has been stripped back as it had a bunch of business specific logic, but the idea hopefully helps others in a similar scenario accelerate their implementation. It assumes a way to identify and number a page, and logic that sets the dictionary internally in the component wrapping render :)