TriliumNext / Notes

Build your personal knowledge base with TriliumNext Notes
https://triliumnext.github.io/Docs/
GNU Affero General Public License v3.0
1.07k stars 60 forks source link

User configurable whitelist of html tags stripped on import #567

Open maphew opened 2 weeks ago

maphew commented 2 weeks ago

Discussed in https://github.com/orgs/TriliumNext/discussions/566

Originally posted by **maphew** November 8, 2024 When importing html files a lot of tags are stripped. For the most part, that's a feature! However some of them I'd like to retain, top of mind for me at the moment is `` I use and rely on that a lot. Where in the import machinery is the logic for what tags are kept and which discarded? Can we "whitelist" more tags to be retained? (I'm happy for styles to continue to be dropped.)

Found https://github.com/TriliumNext/Notes/blob/develop/src/services/html_sanitizer.ts#L28. Whitelist looks straightforward, now to create an enhancement to make these user configureable (or a portion of them, e.g. script should not be possible to whitelist easily).


Given that exposing a config of allowed tags to user settings is a heavier lift than just adding to the list, please consider an interim measure of extending to include the tags:

acronym
article
big
button
cite
col
colgroup
data
dd
fieldset
form
legend
meter
noscript
option
progress
rp
samp
small
sub
sup
template
textarea
tt

If that's too broad my personal shortlist is:

acronym
big
cite
dd
dt
small
sub
sup
tt
maphew commented 1 week ago

the PR #601 succeeds in letting the whitelisted tags pass through the importer. However that only lasts until the note is edited (by ckeditor), at which time the extra tags are stripped. All is not lost however! I'm reading that in CKE v5, which we use, they've introduced the General Html Support core plugin. GHS allows a white- and black-list of allowed tags. From my first read of the docs adding is straightforward:

Note: this preserves the extra tags, but most do not have a corresponding UI widget or button, so you can't add them using the toolbars and must go source edit for that. Still a massive improvement from my point of view. I'll see if I can get this to work.

maphew commented 1 week ago

https://github.com/maphew/trilium/tree/feature/ckeditor-enable-GHS is the first baby step to adding GHS support.

Help wanted! ;-) I don't think I can figure it out even with LLM support as how Trilium uses ckeditor is very different from the ckeditor docs.

eliandoran commented 1 week ago

@maphew , I could help with the CKEditor build. Do you have any sample notes that can be used to test if the import and editing works fine with general HTML support?

maphew commented 1 week ago

oh that would be great @eliandorn, thanks! Here's a quick generated one with the shortlist of tags: elvish-sample.zip