freelawproject / doctor

A microservice for document conversion at scale
https://free.law/projects/doctor
BSD 2-Clause "Simplified" License
54 stars 14 forks source link

Requirements needs to be updated because of LXML changes #185

Open flooie opened 5 months ago

flooie commented 5 months ago
doctor          |   File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
doctor          |   File "/opt/app/doctor/urls.py", line 3, in <module>
doctor          |     from . import views
doctor          |   File "/opt/app/doctor/views.py", line 34, in <module>
doctor          |     from doctor.tasks import (
doctor          |   File "/opt/app/doctor/tasks.py", line 18, in <module>
doctor          |     from lxml.html.clean import Cleaner
doctor          |   File "/usr/local/lib/python3.10/site-packages/lxml/html/clean.py", line 18, in <module>
doctor          |     raise ImportError(
doctor          | ImportError: lxml.html.clean module is now a separate project lxml_html_clean.
doctor          | Install lxml[html_clean] or lxml_html_clean directly.

I'm updating doctor to better handle images or annotations inside a PDF but I came across our new friend - the removal of Cleaner functionality from lmxl.

We should remove and replace this code - I assume it's already causing issues that we haven't noticed yet or may soon enough.

mlissner commented 5 months ago

I don't think it should be causing issues, since we haven't updated any dependencies, but yeah, we should swap it out for NH3, like elsewhere. So it goes!