accessibility-luxembourg / simplA11yPDFCrawler

This tool crawls a list of websites and download all PDF and office documents. Then it analyses the PDF documents and tries to detect accessibility issues.
MIT License
22 stars 3 forks source link

ModuleNotFoundError: No module named 'pikepdf' #5

Open mgifford opened 2 years ago

mgifford commented 2 years ago

The crawl.sh script seemed to work just fine. Was able to scrape a good list of other documents this way.

Trying to run the analysis wasn't so good:

% ./analyse.sh 
find: ./crawled_files/10x.gsa.gov/*.pdf: No such file or directory
./crawled_files/apprenticeship.gov/2021%20Apprenticeship%20Mailer.pdf
Traceback (most recent call last):
  File "/Users/mgifford/Documents/GitHub/simplA11yPDFCrawler/./pdfCheck.py", line 1, in <module>
    from pikepdf import Pdf, String, _qpdf
ModuleNotFoundError: No module named 'pikepdf'
./crawled_files/apprenticeship.gov/29_cf_30_regs_only.pdf
Traceback (most recent call last):
  File "/Users/mgifford/Documents/GitHub/simplA11yPDFCrawler/./pdfCheck.py", line 1, in <module>
    from pikepdf import Pdf, String, _qpdf
ModuleNotFoundError: No module named 'pikepdf'

I'm running on a Mac, but didn't think that would be a problem:

% pip3 install pikepdf
DEPRECATION: Configuring installation scheme with distutils config files is deprecated and will no longer work in the near future. If you are using a Homebrew or Linuxbrew Python, please see discussion at https://github.com/Homebrew/homebrew-core/issues/76621
Requirement already satisfied: pikepdf in /usr/local/lib/python3.9/site-packages (4.2.0)
Requirement already satisfied: lxml>=4.0 in /usr/local/lib/python3.9/site-packages (from pikepdf) (4.6.3)
Requirement already satisfied: Pillow>=6.0 in /usr/local/lib/python3.9/site-packages (from pikepdf) (8.4.0)
DEPRECATION: Configuring installation scheme with distutils config files is deprecated and will no longer work in the near future. If you are using a Homebrew or Linuxbrew Python, please see discussion at https://github.com/Homebrew/homebrew-core/issues/76621

I've tried installing pikepdf on it's own with pip & pip3.

I notice that pikepdf is in the requirements.txt

Not sure if this is a problem at my end or not.

I did cut short my crawl.sh as I seemed to be getting a lot more errors. Anyways, don't think that's the cause of this. It is finding lots of files in the directory.

mgifford commented 1 year ago

Looking at this again.

I'm still getting the same error with the latest code.

I also tried with !pip install pikepdf

Might be tied to:

% pip3 install pikepdf

Collecting pikepdf
  Using cached pikepdf-6.2.8.post1.tar.gz (2.9 MB)
  Installing build dependencies ... done
  Getting requirements to build wheel ... error
  error: subprocess-exited-with-error

  × Getting requirements to build wheel did not run successfully.
  │ exit code: 1
  ╰─> [17 lines of output]
      Error in sitecustomize; set PYTHONVERBOSE for traceback:
      AssertionError:
      Traceback (most recent call last):
        File "/opt/homebrew/lib/python3.10/site-packages/pip/_vendor/pep517/in_process/_in_process.py", line 351, in <module>
          main()
        File "/opt/homebrew/lib/python3.10/site-packages/pip/_vendor/pep517/in_process/_in_process.py", line 333, in main
          json_out['return_val'] = hook(**hook_input['kwargs'])
        File "/opt/homebrew/lib/python3.10/site-packages/pip/_vendor/pep517/in_process/_in_process.py", line 118, in get_requires_for_build_wheel
          return hook(config_settings)
        File "/opt/homebrew/lib/python3.10/site-packages/setuptools/build_meta.py", line 338, in get_requires_for_build_wheel
          return self._get_build_requires(config_settings, requirements=['wheel'])
        File "/opt/homebrew/lib/python3.10/site-packages/setuptools/build_meta.py", line 320, in _get_build_requires
          self.run_setup()
        File "/opt/homebrew/lib/python3.10/site-packages/setuptools/build_meta.py", line 335, in run_setup
          exec(code, locals())
        File "<string>", line 14, in <module>
      ModuleNotFoundError: No module named 'pybind11'
      [end of output]

  note: This error originates from a subprocess, and is likely not a problem with pip.
error: subprocess-exited-with-error

× Getting requirements to build wheel did not run successfully.
│ exit code: 1
╰─> See above for output.

note: This error originates from a subprocess, and is likely not a problem with pip.
brdunfield commented 1 year ago

I'm experiencing the same issue. As far as I can tell it's a Mac M1 compatibility issue. Does this tool currently support M1 chipsets, or is this an issue with our python installs?

mgifford commented 1 year ago

Reinstalling pip (with sudo) seemed to do it in Ubuntu.

mgifford commented 1 year ago

On Mac finding better results after running: brew install pybind11

Not sure if that was what was needed or running pip via sudo.