Open fluffy-critter opened 5 years ago
html2text's output isn't great either:
[fluffy](http://beesbuzz.biz/):
[Reblob!](http://beesbuzz.biz/blog/5385-Reblob):
> [Reblob!](http://publ.beesbuzz.biz/blog/179-Reblob):
>
>> It’s been a while since I’ve worked on IndieWeb stuff, but I finally got
around to releasing an _extremely preliminary_ version of
[reblob](http://publ.beesbuzz.biz/tools/1423-reblob), a little commandline
thingus to make this stuff easier. Eventually I’ll also have a server-based
version here, at least as an example.
>
> Of course this is the first entry I’ve written actually _using_ it. Lots of
rough edges but whatever!
which renders as:
It’s been a while since I’ve worked on IndieWeb stuff, but I finally got around to releasing an extremely preliminary version of reblob, a little commandline thingus to make this stuff easier. Eventually I’ll also have a server-based version here, at least as an example.
Of course this is the first entry I’ve written actually using it. Lots of rough edges but whatever!
Found this through your tweet. There might be a way to use one of pandoc's many customization options to fix this. E.g., you could try to remove soft line-breaks by using a pandoc filter:
function SoftBreak ()
return pandoc.Space() -- replace soft linebreak with a space
end
Use by calling pandoc with pandoc --lua-filter=path/to/that/filter-file.lua …
. Or check if the --wrap=none
option does what you want. Does this help?
@tarleb Not particularly, the way that pandoc works through Pypandoc makes that incredibly unwieldy. But there's also no reason for that in a Pandoc filter, see the branch https://github.com/PlaidWeb/reblob/tree/feature/5-trim-end-whitespace for a simple fix on the Python side.
But even with that there's a lot of stuff pypandoc does poorly that can't be easily addressed by setting markdown plugins either. The Mastodon version of the thread goes into more about that.
There's also a bunch of other reasons I want to get off pandoc, like the Python bindings to it make a lot of assumptions about environment that won't work for one of my intended future use cases, and it's just, like, not very well-controlled in general.
I can also think of a fairly straightforward way to convert HTML to Markdown in a way that will also allow me to put in Publ-markdown extensions. I was hoping reblob would be able to also support things like ReStructuredText for folks who use that on their blog engine though.
Pandoc's
gfm
backend produces markdown like:which formats like
(from this entry).
html2text might be better, but that loses the ability to support other output formats. There might also be some better Pandoc configurations that could be used.