Open delan opened 3 months ago
I couldn't reproduce this issue with 4.10.
[dmbaturin@alcor ~/d/t/brtest]$ soupault --version
soupault 4.10.0
Copyright 2024 Daniil Baturin et al.
soupault is free software distributed under the MIT license.
Visit https://www.soupault.app for news and documentation.
Compiled with OCaml 4.14.2
[dmbaturin@alcor ~/d/t/brtest]$ cat soupault.toml
# To learn about configuring soupault, visit https://www.soupault.app/reference-manual
[settings]
# Soupault version that the config was written/generated for
# Trying to process this config with an older version will result in an error message
soupault_version = "4.10.0"
# Stop on page processing errors?
strict = true
# Display progress?
verbose = true
# Display detailed debug output?
debug = false
# Where input files (pages and assets) are stored.
site_dir = "site"
# Where the output goes
build_dir = "build"
# Files inside the site/ directory can be treated as pages or static assets,
# depending on the extension.
#
# Files with extensions from this list are considered pages and processed.
# All other files are copied to build/ unchanged.
#
# Note that for formats other than HTML, you need to specify an external program
# for converting them to HTML (see below).
page_file_extensions = ["htm", "html", "md", "rst", "adoc"]
# By default, soupault uses "clean URLs",
# that is, $site_dir/page.html is converted to $build_dir/page/index.html
# You can make it produce $build_dir/page.tml instead by changing this option to false
clean_urls = true
# If you set clean_urls=false,
# file names with ".html" and ".htm" extensions are left unchanged.
keep_extensions = ["html", "htm"]
# All other extensions (".md", ".rst"...) are replaced, by default with ".html"
default_extension = "html"
# Page files with these extensions are ignored.
ignore_extensions = ["draft"]
# Soupault can work as a website generator or an HTML processor.
#
# In the "website generator" mode, it considers files in site/ page bodies
# and inserts them into the empty page template stored in templates/main.html
#
# Setting this option to false switches it to the "HTML processor" mode
# when it considers every file in site/ a complete page and only runs it through widgets/plugins.
generator_mode = true
# Files that contain an <html> element are considered complete pages rather than page bodies,
# even in the "website generator" mode.
# This allows you to use a unique layout for some pages and still have them processed by widgets.
complete_page_selector = "html"
# Website generator mode requires a page template (an empty page to insert a page body into).
# If you use "generator_mode = false", this file is not required.
default_template_file = "templates/main.html"
# Page content is inserted into a certain element of the page template.
# This option is a CSS selector that is used for locating that element.
# By default the content is inserted into the <body>
default_content_selector = "body"
# You can choose where exactly to insert the content in its parent element.
# The default is append_child, but there are more, including prepend_child and replace_content
default_content_action = "append_child"
# If a page already has a document type declaration, keep the declaration
keep_doctype = true
# If a page does not have a document type declaration, force it to HTML5
# With keep_doctype=false, soupault will replace existing declarations with it too
doctype = "<!DOCTYPE html>"
# Insert whitespace into HTML for better readability
# When set to false, the original whitespace (if any) will be preserved as is
pretty_print_html = true
# Plugins can be either automatically discovered or loaded explicitly.
# By default discovery is enabled and the place where soupault is looking is the plugins/ subdirectory
# in your project.
# E.g., a file at plugins/my-plugin.lua will be registered as a widget named "my-plugin".
plugin_discovery = true
plugin_dirs = ["plugins"]
# Soupault can cache outputs of external programs
# (page preprocessors and preprocess_element widget commands).
# It's disabled by default but you can enable it and configure the cache directory name/path
caching = false
cache_dir = ".soupault-cache"
# Soupault supports a variety of page source character encodings,
# the default encoding is UTF-8
page_character_encoding = "utf-8"
# It is possible to store pages in any format if you have a program
# that converts it to HTML and writes it to standard output.
# Example:
#[preprocessors]
# md = "cmark --unsafe --smart"
# adoc = "asciidoctor -o -"
# Pages can be further processed with "widgets"
# Takes the content of the first <h1> and inserts it into the <title>
[widgets.page-title]
widget = "title"
selector = "h1"
default = "My Homepage"
append = " — My Homepage"
# Insert a <title> in a page if it doesn't have one already.
# By default soupault assumes if it's missing, you don't want it.
force = false
# Inserts a generator meta tag in the page <head>
# Just for demonstration, feel free to remove
[widgets.generator-meta]
widget = "insert_html"
html = '<meta name="generator" content="soupault">'
selector = "head"
# <blink> elements are evil, delete them all
[widgets.no-blink]
widget = "delete_element"
selector = "blink"
# By default this widget deletes all elements matching the selector,
# but you can set this option to false to delete just the first one
delete_all = true
[widgets.test]
widget = "test"
[dmbaturin@alcor ~/d/t/brtest]$ cat plugins/test.lua
HTML.delete_content(page)
HTML.append_root(page, HTML.create_element("img"))
HTML.append_root(page, HTML.create_element("br"))
[dmbaturin@alcor ~/d/t/brtest]$ soupault
[INFO] Starting soupault 4.10.0 in website generator mode
[INFO] Loading plugins
[INFO] Loading widgets
[INFO] Loading hooks
[INFO] Starting website build
[INFO] Processing page site/index.html
[INFO] Using the default template for page site/index.html
[INFO] Processing widget generator-meta on page site/index.html
[INFO] Processing widget page-title on page site/index.html
[INFO] Processing widget test on page site/index.html
[INFO] Processing widget no-blink on page site/index.html
[INFO] Writing generated page to build/index.html
[dmbaturin@alcor ~/d/t/brtest]$ cat build/index.html
<!DOCTYPE html>
<img><br>
Hmm, one idea: does the original page has a doctype that allows void elements, like <!DOCTYPE html>
? The doctype does affect the parsing and rendering mode selection in Markup.ml/LambdaSoup.
The following yields
<!DOCTYPE html><img></img><br></br>
. For the <img> this should have no negative effects, but for the <br> this parses as two <br> elements per #parsing-main-inbody of the HTML spec.One workaround to get
<!DOCTYPE html><img><br>
is to use HTML.parse():In some situations, the HTML.parse() needs to be wrapped in a HTML.select_one():