e3rd / html2markdown

converts html to markdown / zim markup
2 stars 1 forks source link

Using in Ubuntu #1

Closed sojusnik closed 6 years ago

sojusnik commented 6 years ago

Hey!

Just installed pip3 install --user beautifulsoup4 and then downloaded and extracted your script into ~/Downloads/html2markdown-master/.

What would be the next step on Ubuntu to use your script to convert a text file containing HTML into Zim's markup?

Many thanks in advance!

e3rd commented 6 years ago

Hi, I really forgot I've programmed something like this :D Hope it'll help you.

Let's go to the directory, put there your html and just launch the script:

cd ~/Downloads/html2markdown-master/
./html2zim --zim path_to_your_html_file.html

Tell me if there's any problem.

sojusnik commented 6 years ago

I get the following:

~/Downloads/html2markdown-master$ ./html2zim --zim /home/sojusnik/Downloads/html2markdown-master/test.html
bash: ./html2zim: Datei oder Verzeichnis nicht gefunden (Folder or file not found)
e3rd commented 6 years ago

Sorry, there is an error in the README.md: html2zim was the former file name because I was willing to use mainly the Zim feature. However, the only file in the folder is html2markdown.py :) , please try ./html2markdown.py --zim ... and let me now if that worked!

sojusnik commented 6 years ago

Np, but then another error message appears:

~/Downloads/html2markdown-master$ ./html2markdown.py --zim /home/sojusnik/Downloads/html2markdown-master/test.html
Traceback (most recent call last):
  File "./html2markdown.py", line 3, in <module>
    import re, json, ipdb, sys, os, traceback
ModuleNotFoundError: No module named 'ipdb'
e3rd commented 6 years ago

sorry, either get the dependency pip3 install ipdb --user or fetch the new version please

sojusnik commented 6 years ago

With the new version I get:

~/Downloads/html2markdown-master$ ./html2markdown.py --zim /home/sojusnik/Downloads/html2markdown-master/test.html
Traceback (most recent call last):
  File "./html2markdown.py", line 246, in <module>
    Html2Markdown(def_file, args.file, ext)    
  File "./html2markdown.py", line 162, in __init__
    soup = bs(f.read(), "lxml")
  File "/home/sojusnik/.local/lib/python3.6/site-packages/bs4/__init__.py", line 165, in __init__
    % ",".join(features))
bs4.FeatureNotFound: Couldn't find a tree builder with the features you requested: lxml. Do you need to install a parser library?
e3rd commented 6 years ago

Wow, I see I really didn't know how to distrubute a package :D Try first pip3 install --user lxml (and if that didn't work then sudo apt-get install python3-lxml) and tell me what worked so that I can update the info.

sojusnik commented 6 years ago

:)

After installing pip3 install --user lxml I receive:

~/Downloads/html2markdown-master$ ./html2markdown.py --zim /home/sojusnik/Downloads/html2markdown-master/test.html
Traceback (most recent call last):
  File "./html2markdown.py", line 14, in d
    yield
  File "./html2markdown.py", line 76, in loopEl
    el.definition, el.form = self._getFormat(el)
  File "./html2markdown.py", line 57, in _getFormat
    accords = (el.parent.parent.name == par["parent-name"])
AttributeError: 'NoneType' object has no attribute 'name'
Traceback (most recent call last):
  File "./html2markdown.py", line 14, in d
    yield
  File "./html2markdown.py", line 76, in loopEl
    el.definition, el.form = self._getFormat(el)                                
  File "./html2markdown.py", line 57, in _getFormat
    accords = (el.parent.parent.name == par["parent-name"])          
AttributeError: 'NoneType' object has no attribute 'name'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "./html2markdown.py", line 246, in <module>
    Html2Markdown(def_file, args.file, ext)    
  File "./html2markdown.py", line 179, in __init__
    self.loopEl(el)
  File "./html2markdown.py", line 76, in loopEl
    el.definition, el.form = self._getFormat(el)                                
  File "/usr/lib/python3.6/contextlib.py", line 99, in __exit__
    self.gen.throw(type, value, traceback)
  File "./html2markdown.py", line 18, in d
    pdb.post_mortem(tb)
NameError: name 'pdb' is not defined

The error message above remains, even when sudo apt-get install python3-lxml is installed afterwards.

e3rd commented 6 years ago

Thanks, I've updated README.md with a set of requirements to be installed at once. Launch pip3 install --user ipdb pdb to ensure new dependencies.

But it seems something changed in the OneNote HTML format or so so that I might not be able to help you. I've tried something so you may still try to download the new version.

sojusnik commented 6 years ago

It still doesn't work:

$ ./html2markdown.py --zim test.html Traceback (most recent call last): File "./html2markdown.py", line 248, in Html2Markdown(def_file, args.file, ext)
File "./html2markdown.py", line 181, in init self.loopEl(el) File "./html2markdown.py", line 86, in loopEl self.prevEl.sout += el.sout
TypeError: unsupported operand type(s) for +=: 'NoneType' and 'str'

sojusnik commented 6 years ago

So is your script only working with the OneNote HTML format or with all HTML sites? I've tried your script with a "normal" HTML site.

e3rd commented 6 years ago

As states the very first sentence in the description, html2markdown is an utility to convert HTML code produced by OneNote. OneNote produces a chaos-code HTML that another converter would hardly understand. In theory, the tool is not bound to OneNote-HTML only but to convert a random HTML, I think it is much more easier to use another tool or even an online converter.