Closed nicbou closed 1 year ago
It's not invalid per se, it just requires CSS escaping in a selector to target it. If you do not want your ID to start with a number, which it will if you let it get auto-slugified, then you should specify a specific ID that you are okay with (make sure you've enabled attr_list
extension.
### 1. Create an account {#create-an-account}
I think the appropriate question here is this: Is there a spec which explicitly disallows a digit as the first character of an id? If so, we should somehow address that. If not, then @facelessuser's suggested workarounds are sufficient.
I did a bit of research, and HTML 5 does allow an ID like "1-test-title". HTML 4 had more restrictions. Like @facelessuser said, it's only a problem with CSS selectors.
document.getElementById('1-test-title')
works fine, but document.querySelector('#1-test-title')
causes a DOMException.
In hindsight, this is not an issue that Python-Markdown should deal with.
If you have the same problem as I did, you can fix it like this:
from markdown.extensions.toc import TocExtension, slugify
def patched_slugify(value, separator, keep_unicode=False):
return slugify(value.lstrip(' 0123456789'), separator, keep_unicode)
class MarkdownProcessor(EntryContextProcessor):
def __init__(self):
super().__init__()
self.markdown = markdown.Markdown(extensions=[
# ...
TocExtension(slugify=patched_slugify),
# ...
If we are talking about CSS selectors, the spec states that identifiers do not start with numbers. An ID is #<identifier>
, classes are .<identifier>
. This does not make the ID or classes invalid as classes and identifiers, but you have to use CSS escapes to specify a class or ID that starts with numbers, then it will work.
If I want to target a CSS ID that starts with zero:
document.querySelector('#\\31 -test-title')
Now, does this work directly in a CSS file? Maybe not. But there is no HTML restriction.
Interestingly, there are a lot of Markdown parsers that just pass leading numbers through during slugification. There are some that ignore starting numbers, and (GFM) that append user-content
on all IDs to avoid this issue.
On another note, using numbers at the beginning of an ID does not affect jumping to an ID in a browser, so this is only a CSS restriction. I'm personally not against stripping leading numbers or altering them in some way, but it is technically not invalid HTML, just not easily targetable in a CSS file, but it is targettable with JS `querySelector.
I think that provides enough info for an informed decision.
An alternative to stripping the number could be to simply prepend -
or _
in front of IDs that start with numbers.
Thanks for the information. As there is no such restriction in the HTML spec and most other implementations have the same/similar behavior, I an inclined to leave the default behavior as-is. And as we already provide a way for a user to provide their own slugify
function, I see no reason to take any action.
I fully agree. Thanks for taking the time. Python Markdown is a great project, and I'm grateful for your work.
The heading
### 1. Create an account
has the ID#1-create-an-account
, which is not valid because it starts with a digit. This breaks document.querySelector and document.getElementById.Thankfully, slugify is configurable, but it's something that should likely be corrected on the default function too.