Closed thfrei closed 2 years ago
Thanks for the PR! However I don't think this is the right solution. It seems like the HTML parser doesn't detect a start tag. Could you add a print for tag_match[0]
and let me know what value it has in the case that fails?
Thank you Danmou for your incredibly fast response, and sorry for my delay. You are probably right, I just hacked together something, so that it would work.
I think I have to abandon using the api, since I do not get any of my hand-drawings out of it. I think I'll stick to something like .mht
export and then extracting the base64-images.
Here's results of my tag_match[0]
print, commit b098dc6e553d64725aa46b19ae352c20f5099d75
...snip...
Opening page 7 Fossil CMS
HTML file already exists; skipping this page
Opening page 8 Fossil learning TH1
Got content of length 19954
parser.feed tag_match: <img alt="THI:
global State flags
global State user
Run THI
<Base href=" $baseurl/$current_page
<meta
<meta http—egui•F-"Content—security—policy" csp" />
127.0.0.1 - - [04/Jan/2022 07:48:47] "GET /getToken?code=M.R3_BAY.225081db-b1a8-f945-ac69-3ee570dcadd4&state=a6efcfda-9d5f-4e28-98dd-93b766dd6b55 HTTP/1.1" 500 -
Traceback (most recent call last):
File "C:\ProgramData\scoop\apps\miniconda3\4.10.3\Lib\site-packages\flask\app.py", line 2091, in __call__
return self.wsgi_app(environ, start_response)
File "C:\ProgramData\scoop\apps\miniconda3\4.10.3\Lib\site-packages\flask\app.py", line 2076, in wsgi_app
response = self.handle_exception(e)
File "C:\ProgramData\scoop\apps\miniconda3\4.10.3\Lib\site-packages\flask\app.py", line 2073, in wsgi_app
response = self.full_dispatch_request()
File "C:\ProgramData\scoop\apps\miniconda3\4.10.3\Lib\site-packages\flask\app.py", line 1518, in full_dispatch_request
rv = self.handle_user_exception(e)
File "C:\ProgramData\scoop\apps\miniconda3\4.10.3\Lib\site-packages\flask\app.py", line 1516, in full_dispatch_request
rv = self.dispatch_request()
File "C:\ProgramData\scoop\apps\miniconda3\4.10.3\Lib\site-packages\flask\app.py", line 1502, in dispatch_request
return self.ensure_sync(self.view_functions[rule.endpoint])(**req.view_args)
File "C:\Users\Matt\code\onenote_export\onenote_export.py", line 240, in main_logic
download_notebooks(graph_client, app.config['output_path'], app.config['select_path'], indent=0)
File "C:\Users\Matt\code\onenote_export\onenote_export.py", line 179, in download_notebooks
download_sections(graph_client, sections, path / nb_name, select, indent=indent + 1)
File "C:\Users\Matt\code\onenote_export\onenote_export.py", line 200, in download_sections
download_pages(graph_client, pages, path / sec_name, select, indent=indent + 1)
File "C:\Users\Matt\code\onenote_export\onenote_export.py", line 216, in download_pages
download_page(graph_client, page['contentUrl'], page_dir, indent=indent + 1)
File "C:\Users\Matt\code\onenote_export\onenote_export.py", line 229, in download_page
content = download_attachments(graph_client, content, path, indent=indent)
File "C:\Users\Matt\code\onenote_export\onenote_export.py", line 150, in download_attachments
content = re.sub(r"<img .*?\/>", download_image, content, flags=re.DOTALL)
File "C:\ProgramData\scoop\apps\miniconda3\4.10.3\Lib\re.py", line 210, in sub
return _compile(pattern, flags).sub(repl, string, count)
File "C:\Users\Matt\code\onenote_export\onenote_export.py", line 112, in download_image
props = parser.attrs
AttributeError: 'MyHTMLParser' object has no attribute 'attrs'
Here's a screenshot of the portion of the page it seems to be having trouble with. It's two images embedded in a table
the complete tag match text is:
<img alt="THI:
global State flags
global State user
Run THI
<Base href=" $baseurl/$current_page
<meta
<meta http—egui•F-"Content—security—policy" csp" />
(updated the debug statement to if debug: print(f"parser.feed tag_match: '''{tag_match[0]}''' ")
to make it easier to see )
Add a check for parser.attrs
fixes #10