flexion / ef-cms

An Electronic Filing / Case Management System.
22 stars 9 forks source link

Upgrade pdfjs-dist #10300

Open TomElliottFlexion opened 3 months ago

TomElliottFlexion commented 3 months ago

As an engineer, so that I can keep DAWSON secure and up-to-date, I need to update pdfjs-dist to version ^3.x.x.

Pre-Conditions

See Notes section below for what's been tried already.

Acceptance Criteria

Pain Avoided/Frustration Saved

Breadth/Pervasiveness of Problem

Complexity of Problem (Low, Medium, High) and Why it's Complex

Notes

Last week (Nov 30th), Rosie, Tim, and Rachel spent a few days trying to upgrade pdfjs-dist. Below are the lessons learned from that.

Latest version of pdfjs-dist is 3.1.81, DAWSON is currently running on 2.16.105

The error we see when upgrading without making any other code changes ONLY appears on a deployed environment, not locally.

The error we see when upgrading without making any other code changes occurs when uploading a court issued PDF, that is because the only place we use this package is to OCR uploaded court issued documents.

There is some conflicting documentation available about how to use this package, on one hand, Mozilla indicates that the legacy build of pdfjs-dist is used to support Node environments. On the other hand, there are special instructions for using the package with webpack.

When we tried importing the package using the webpack instructions one of the errors we observed was Error scraping PDF with PDF.JS v3.1.81 structuredClone is not defined. This COULD potentially be resolved by upgrading to Node 17+ where structured clone is supported. Note there is no guarantee this would fix the pdfjs-dist errors, just resolve the structuredClone error.

The error we see when upgrading without making any other code changes is: `Error scraping PDF with PDF.JS vundefined Cannot read properties of undefined (reading 'prototype')'

The error we see when upgrading and changing the way we import to use the recommended webpack use, is: Error scraping PDF with PDF.JS v3.1.81 Setting up fake worker failed: "Cannot find module 'canvas'

En-8 commented 1 month ago

Our version of pdfjs-dist is now being flagged for a high severity vulnerability "PDF.js vulnerable to arbitrary JavaScript execution upon opening a malicious PDF - https://github.com/advisories/GHSA-wgrm-67xf-hhpq"

Perhaps we should reconsider the urgency of this card?

pixiwyn commented 2 weeks ago

Regarding the security vulnerability.. Our version of pdfjs-dist is now being flagged for a high severity vulnerability "PDF.js vulnerable to arbitrary JavaScript execution upon opening a malicious PDF - https://github.com/advisories/GHSA-wgrm-67xf-hhpq"

We handled it here: https://app.zenhub.com/workspaces/flexionef-cms-5bbe4bed4b5806bc2bec65d3/issues/gh/flexion/ef-cms/10407

pixiwyn commented 2 weeks ago

Some helpful notes for whoever picks this up in the future...

Correctly dynamic imports:

2024-06-17_14-03-35.png

Setting tsconfig to use modules via NodeNext, seems to allow it to support the latest version, however it's unknown what side effects may occur.

2024-06-17_14-04-40.png