I would like to thank you @VikParuchuri for all of your hard work on this, I am very impressed with my results so far!
I am putting this here because I unfortunately lack this skill to provide a proper PR for my proposed change and I totally understand if you do not have the time to implement it either.
Many languages do not format headings in the same way as in English (i.e capitalizing each word). Instead, typically only the first letter is capitalized.
Accordingly, it would be nice if this could be declared as a flag or even better be assumed based on the document language.
I have solved this issue for myself by modifying the block_surround function in marker/markdown.py as seen below. Of course, my solution cannot be implemented as such in you project since it would not work for those needing English style capitalization.
def block_surround(text, block_type):
if block_type == "Section-header":
if not text.startswith("#"):
words = text.strip().split()
if words:
words[0] = words[0].capitalize() # Capitalize the first word
text = ' '.join(words) # Keep other words in lowercase
text = "\n## " + text + "\n"
elif block_type == "Title":
if not text.startswith("#"):
words = text.strip().split()
if words:
words[0] = words[0].capitalize() # Capitalize the first word
text = ' '.join(words) # Keep other words in lowercase
text = "# " + text + "\n"
elif block_type == "Table":
text = "\n" + text + "\n"
elif block_type == "List-item":
pass
elif block_type == "Code":
text = "\n" + text + "\n"
return text
I would like to thank you @VikParuchuri for all of your hard work on this, I am very impressed with my results so far!
I am putting this here because I unfortunately lack this skill to provide a proper PR for my proposed change and I totally understand if you do not have the time to implement it either.
Many languages do not format headings in the same way as in English (i.e capitalizing each word). Instead, typically only the first letter is capitalized.
Accordingly, it would be nice if this could be declared as a flag or even better be assumed based on the document language.
I have solved this issue for myself by modifying the block_surround function in marker/markdown.py as seen below. Of course, my solution cannot be implemented as such in you project since it would not work for those needing English style capitalization.