GoogleCloudPlatform / dlp-pdf-redaction

This solution provides an automated, serverless way to redact sensitive data from PDF files using Google Cloud Services like Data Loss Prevention (DLP), Cloud Workflows, and Cloud Run.
Apache License 2.0
50 stars 25 forks source link

Seems to be failing on step 2 for me which is Split PDF into Pages #15

Closed miles-dev33 closed 1 year ago

miles-dev33 commented 1 year ago

HTTP server responded with error code 503 in step "2. Split PDF into pages", routine "main", line: 48 { "body": "Service Unavailable", "code": 503, "headers": { "Alt-Svc": "h3=\":443\"; ma=2592000,h3-29=\":443\"; ma=2592000", "Content-Length": "19", "Content-Type": "text/plain", "Date": "Fri, 03 Nov 2023 17:47:53 GMT", "Server": "Google Frontend", "X-Cloud-Trace-Context": "853a3ce82faa491cbd266fd3aa572001;o=1" }, "message": "HTTP server responded with error code 503", "tags": [ "HttpError" ] }

Any ideas?

miles-dev33 commented 1 year ago

Has to do with a dependency issue with Werkzeug: File "/usr/local/lib/python3.10/site-packages/flask/init.py", line 7, in from .app import Flask as Flask File "/usr/local/lib/python3.10/site-packages/flask/app.py", line 27, in from . import cli File "/usr/local/lib/python3.10/site-packages/flask/cli.py", line 17, in from .helpers import get_debug_flag File "/usr/local/lib/python3.10/site-packages/flask/helpers.py", line 14, in from werkzeug.urls import url_quote ImportError: cannot import name 'url_quote' from 'werkzeug.urls' (/usr/local/lib/python3.10/site-packages/werkzeug/urls.py)

Here's a related stackoverflow on it: https://stackoverflow.com/questions/77213053/importerror-cannot-import-name-url-quote-from-werkzeug-urls

Tried updating the pdf-splitter dependancies to: flask==2.2.2 and adding Werkzeug==2.3.7

But that didn't do the trick either.

Also, tried upgrading flask with: pip install --upgrade Flask

which resulted in: Successfully installed Flask-3.0.0 Werkzeug-3.0.1

and that's a good version of Werkzeug since there was a security vulnerability in the prior version, but to no avail I received the same error after. Which is somewhat expected since the stackoverflow post mentioned that: "Flask 2.2.2 isn't made for Werkzeug 3.0.0"

Going to try pip install Flask==2.0.1 pip install Werkzeug==2.2.2

miles-dev33 commented 1 year ago

I FINALLY GOT IT WORKING!!!!

Solution: Update all of the requirements.txt files in the following directories dlp-runner, findings-writer, pdf-merger, and pdf-splitter to have Flask==2.1.2 and Werkzeug==2.3.7, since each of those will try to install Werkzeug 3.0.1 which no longer is combatable with the version of Flask that is installed due to backwards compatibility issues.

:)

felimartina commented 1 year ago

Hey @milesjmccloskey , thanks for reporting the issue and digging out the solution. We'll be pushing out a fix with your recommendations (as well as updated libraries and dockerfiles) in the next few days :)

I'll keep you posted and close this issue once we push a fix.

felimartina commented 1 year ago

@milesjmccloskey, I went ahead and upgraded the python libraries to fix this issue. Please checkout the latest code and give it a try.

Thanks for reporting the issue and suggesting a fix.

Let me know how it goes and if you find any other bug.

miles-dev33 commented 1 year ago

I updated to using those new libraries and still received the error by the way: ImportError: cannot import name 'url_quote' from 'werkzeug.urls' (/usr/local/lib/python3.10/site-packages/werkzeug/urls.py)

at . ( /usr/local/lib/python3.10/site-packages/flask/helpers.py:14 ) at . ( /usr/local/lib/python3.10/site-packages/flask/cli.py:17 ) at . ( /usr/local/lib/python3.10/site-packages/flask/app.py:27 ) at . ( /usr/local/lib/python3.10/site-packages/flask/init.py:7 ) at . ( /app/main.py:21 ) at ._call_with_frames_removed ( :241 ) at .exec_module ( :883 ) at ._load_unlocked ( :688 ) at ._find_and_load_unlocked ( :1006 ) at ._find_and_load ( :1027 ) at ._gcd_import ( :1050 ) at .import_module ( /usr/local/lib/python3.10/importlib/init.py:126 ) at .import_app ( /usr/local/lib/python3.10/site-packages/gunicorn/util.py:359 ) at .load_wsgiapp ( /usr/local/lib/python3.10/site-packages/gunicorn/app/wsgiapp.py:48 ) at .load ( /usr/local/lib/python3.10/site-packages/gunicorn/app/wsgiapp.py:58 ) at .wsgi ( /usr/local/lib/python3.10/site-packages/gunicorn/app/base.py:67 ) at .load_wsgi ( /usr/local/lib/python3.10/site-packages/gunicorn/workers/base.py:146 ) at .init_process ( /usr/local/lib/python3.10/site-packages/gunicorn/workers/base.py:134 ) at .init_process ( /usr/local/lib/python3.10/site-packages/gunicorn/workers/gthread.py:92 ) at .spawn_worker ( /usr/local/lib/python3.10/site-packages/gunicorn/arbiter.py:589 )

The issue is that Werkzeug 3.0.1 and latest Flask library doesn't specify the dependency correctly:

The root cause of this is that Werkzeug 3.0.0 removed previously deprecated code: https://werkzeug.palletsprojects.com/en/3.0.x/changes/#version-3-0-0

Granted it would be ideal if Werkzeug 3.0.1 worked since it fixed a security vulnerability

The solution to get the project to work for me was having to add: Werkzeug==2.3.7

to all of the requirements.txt files