Closed sojusnik closed 6 years ago
Hi, I really forgot I've programmed something like this :D Hope it'll help you.
Let's go to the directory, put there your html and just launch the script:
cd ~/Downloads/html2markdown-master/
./html2zim --zim path_to_your_html_file.html
Tell me if there's any problem.
I get the following:
~/Downloads/html2markdown-master$ ./html2zim --zim /home/sojusnik/Downloads/html2markdown-master/test.html
bash: ./html2zim: Datei oder Verzeichnis nicht gefunden (Folder or file not found)
Sorry, there is an error in the README.md: html2zim
was the former file name because I was willing to use mainly the Zim feature. However, the only file in the folder is html2markdown.py
:) , please try ./html2markdown.py --zim ...
and let me now if that worked!
Np, but then another error message appears:
~/Downloads/html2markdown-master$ ./html2markdown.py --zim /home/sojusnik/Downloads/html2markdown-master/test.html
Traceback (most recent call last):
File "./html2markdown.py", line 3, in <module>
import re, json, ipdb, sys, os, traceback
ModuleNotFoundError: No module named 'ipdb'
sorry, either get the dependency pip3 install ipdb --user
or fetch the new version please
With the new version I get:
~/Downloads/html2markdown-master$ ./html2markdown.py --zim /home/sojusnik/Downloads/html2markdown-master/test.html
Traceback (most recent call last):
File "./html2markdown.py", line 246, in <module>
Html2Markdown(def_file, args.file, ext)
File "./html2markdown.py", line 162, in __init__
soup = bs(f.read(), "lxml")
File "/home/sojusnik/.local/lib/python3.6/site-packages/bs4/__init__.py", line 165, in __init__
% ",".join(features))
bs4.FeatureNotFound: Couldn't find a tree builder with the features you requested: lxml. Do you need to install a parser library?
Wow, I see I really didn't know how to distrubute a package :D
Try first pip3 install --user lxml
(and if that didn't work then sudo apt-get install python3-lxml
) and tell me what worked so that I can update the info.
:)
After installing pip3 install --user lxml
I receive:
~/Downloads/html2markdown-master$ ./html2markdown.py --zim /home/sojusnik/Downloads/html2markdown-master/test.html
Traceback (most recent call last):
File "./html2markdown.py", line 14, in d
yield
File "./html2markdown.py", line 76, in loopEl
el.definition, el.form = self._getFormat(el)
File "./html2markdown.py", line 57, in _getFormat
accords = (el.parent.parent.name == par["parent-name"])
AttributeError: 'NoneType' object has no attribute 'name'
Traceback (most recent call last):
File "./html2markdown.py", line 14, in d
yield
File "./html2markdown.py", line 76, in loopEl
el.definition, el.form = self._getFormat(el)
File "./html2markdown.py", line 57, in _getFormat
accords = (el.parent.parent.name == par["parent-name"])
AttributeError: 'NoneType' object has no attribute 'name'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "./html2markdown.py", line 246, in <module>
Html2Markdown(def_file, args.file, ext)
File "./html2markdown.py", line 179, in __init__
self.loopEl(el)
File "./html2markdown.py", line 76, in loopEl
el.definition, el.form = self._getFormat(el)
File "/usr/lib/python3.6/contextlib.py", line 99, in __exit__
self.gen.throw(type, value, traceback)
File "./html2markdown.py", line 18, in d
pdb.post_mortem(tb)
NameError: name 'pdb' is not defined
The error message above remains, even when sudo apt-get install python3-lxml
is installed afterwards.
Thanks, I've updated README.md with a set of requirements to be installed at once.
Launch pip3 install --user ipdb pdb
to ensure new dependencies.
But it seems something changed in the OneNote HTML format or so so that I might not be able to help you. I've tried something so you may still try to download the new version.
It still doesn't work:
$ ./html2markdown.py --zim test.html Traceback (most recent call last): File "./html2markdown.py", line 248, in
Html2Markdown(def_file, args.file, ext)
File "./html2markdown.py", line 181, in init self.loopEl(el) File "./html2markdown.py", line 86, in loopEl self.prevEl.sout += el.sout
TypeError: unsupported operand type(s) for +=: 'NoneType' and 'str'
So is your script only working with the OneNote HTML format or with all HTML sites? I've tried your script with a "normal" HTML site.
As states the very first sentence in the description, html2markdown is an utility to convert HTML code produced by OneNote. OneNote produces a chaos-code HTML that another converter would hardly understand. In theory, the tool is not bound to OneNote-HTML only but to convert a random HTML, I think it is much more easier to use another tool or even an online converter.
Hey!
Just installed
pip3 install --user beautifulsoup4
and then downloaded and extracted your script into~/Downloads/html2markdown-master/
.What would be the next step on Ubuntu to use your script to convert a text file containing HTML into Zim's markup?
Many thanks in advance!