Szwendacz99 / BookStack-Python-exporter

Customizable script for exporting notes from BookStack through API. Export Pages, Chapters, Books, attachments and images.
MIT License
23 stars 3 forks source link

Ignore / Skip broken images #12

Closed Krafting closed 3 months ago

Krafting commented 3 months ago

Hey again!

I just tested the --images options, and it seems to work great, but, we have a lot of docs, and a few docs have broken images in them, which mean the script just give up once it find one. And I don't know how to track them properly to fix them...

It might be a good idead for an option to skip broken images, here are the logs :

DEBUG :: Checking for update for file ./Docs/exported-images/uploads/images/gallery/2023-11/uPgJCxGapluZsdrS-image-1700556603853.png
DEBUG :: Document ./Docs/exported-images/uploads/images/gallery/2023-11/uPgJCxGapluZsdrS-image-1700556603853.png is missing on disk, update needed.
DEBUG :: Making http request: http://<some_ip>:<bookstack_port>/uploads/images/gallery/2023-11/uPgJCxGapluZsdrS-image-1700556603853.png
Traceback (most recent call last):
  File "/usr/local/lib/python3.12/urllib/request.py", line 1344, in do_open
    h.request(req.get_method(), req.selector, req.data, headers,
  File "/usr/local/lib/python3.12/http/client.py", line 1331, in request
    self._send_request(method, url, body, headers, encode_chunked)
  File "/usr/local/lib/python3.12/http/client.py", line 1377, in _send_request
    self.endheaders(body, encode_chunked=encode_chunked)
  File "/usr/local/lib/python3.12/http/client.py", line 1326, in endheaders
    self._send_output(message_body, encode_chunked=encode_chunked)
  File "/usr/local/lib/python3.12/http/client.py", line 1085, in _send_output
    self.send(msg)
  File "/usr/local/lib/python3.12/http/client.py", line 1029, in send
    self.connect()
  File "/usr/local/lib/python3.12/http/client.py", line 995, in connect
    self.sock = self._create_connection(
                ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/socket.py", line 852, in create_connection
    raise exceptions[0]
  File "/usr/local/lib/python3.12/socket.py", line 837, in create_connection
    sock.connect(sa)
ConnectionRefusedError: [Errno 111] Connection refused
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
  File "/builds/dsin/infra/bookstack/exporter.py", line 687, in <module>
    export_images()
  File "/builds/dsin/infra/bookstack/exporter.py", line 539, in export_images
    data: bytes = api_get_bytes(img.get_url(), raw_url=True)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/builds/dsin/infra/bookstack/exporter.py", line 374, in api_get_bytes
    with urlopen(request) as response:
         ^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/urllib/request.py", line 215, in urlopen
    return opener.open(url, data, timeout)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/urllib/request.py", line 515, in open
    response = self._open(req, data)
               ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/urllib/request.py", line 532, in _open
    result = self._call_chain(self.handle_open, protocol, protocol +
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/urllib/request.py", line 492, in _call_chain
    result = func(*args)
             ^^^^^^^^^^^
  File "/usr/local/lib/python3.12/urllib/request.py", line 1373, in http_open
    return self.do_open(http.client.HTTPConnection, req)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/urllib/request.py", line 1347, in do_open
    raise URLError(err)
urllib.error.URLError: <urlopen error [Errno 111] Connection refused>

I redacted the IP and Port, but I think you get the idea. The image was uploaded using the old IP of the server, therefore is now broken, as the IP is not reachable. I could fix the image in all the docs (and I'll try too) but in the meantime having the export to only export working images would be cool.

I hope it was clear enough for you !

Thank you for your time and work on this project !

Szwendacz99 commented 3 months ago

I didn't know there is such possibility, does bookstack actually support changing domain/url of instance? Anyway fix for that should be easy, I made a branch skip-broken-images with fix and pull request. Could you test it? If it works I will merge to main and add proper release with tag.

Krafting commented 3 months ago

Anyway fix for that should be easy, I made a branch skip-broken-images with fix and pull request. Could you test it? If it works I will merge to main and add proper release with tag.

I'll be able to test it on Monday, I'll be back to you when I test it! Thank your for the quick fix!

Also, I don't think bookstack allow changing the domain/url, or at least it doesn't update images in docs.

Krafting commented 3 months ago

Good morning!

I've just tested it with the option and it seems to work very well, also tried it without the option and it kept the old behaviour (no regression it seems!)

Thank you again for the fast patch!