agentcooper / react-pdf-highlighter

Set of React components for PDF annotation
https://agentcooper.github.io/react-pdf-highlighter/
MIT License
1.04k stars 407 forks source link

Bounding rectangle calculation / transformation #242

Open jamesioppolo opened 1 year ago

jamesioppolo commented 1 year ago

Can you please clarify the calculation of the bounding rectangle for the “Type Checking for Javascript” example shown on the webpage https://agentcooper.github.io/react-pdf-highlighter/ ?

Coordinate Values from PDF

A string find query on the PDF itself (https://arxiv.org/pdf/1708.08021.pdf) for the string “Type Checking for Javascript” obtains the following coordinate values:

const coordinatesFromPdf = {
  width: 298.81699979999996,
  height: 14.3462,
  str: "Fast and Precise Type Checking for JavaScript",
  transform: [14.3462, 0, 0, 14.3462, 45.828, 625.368], 
};

Hardcoded values in example

However, the hardcoded value in the example to highlight the subtext “Type Checking for javascript” requires the definition the hardcoded coordinates below (https://github.com/agentcooper/react-pdf-highlighter/blob/main/example/src/test-highlights.ts#L5)

const hardcodedCoordinates = {
  content: {
    text: " Type Checking for JavaScript",
  },
  position: {
    boundingRect: {
      x1: 255.73419189453125,
      y1: 139.140625,
      x2: 574.372314453125,
      y2: 165.140625,
      width: 809.9999999999999,
      height: 1200,
    },
    rects: [
      {
        ..as above...
      },
    ],
}

Transformation function

RBSUS commented 1 year ago

I think there is an argument that this library should be modified to use coordinatesFromPdf as the default highlight state, rather than the existing boundingRect approach. The former would enable more advanced manipulation of highlights and highlights to be added 'headlessly' without rendering the dom