Open Smitty010 opened 11 months ago
Hmm... There were some fixes to issues with rendering HTML in code blocks that looks like it's had some side effects. I'll look into it.
It looks like Medium code blocks are just terrible... Even without syntax highlighting applied, they're wrapped in a <pre><span>
(no <code>
in sight) and use <br>
for newlines (therefore negating the whole point of even using a <pre>
). Also, everything has random bundle-generated class names. In short, it's just horrendous in terms of machine readability.
I couldn't follow the original example, because Medium wants me to pay, but I did find a simple example here: https://blog.medium.com/code-blocks-with-syntax-highlighting-53343df53c4f
in which this:
looks like this in the source:
<pre class="mt mu mv mw mx my mz ni bo nj ba bj"><span id="e759" class="nk ne ew mz b bf nl nm l nn nh" data-selectable-paragraph=""><span class="hljs-comment">// highlighted code is easier to read</span><br><span class="hljs-keyword">function</span> <span class="hljs-title.function">newCodeBlock</span>() {<br> <span class="hljs-keyword">return</span> “jazzy!”;<br>}</span></pre>
I can see from the classes that it's using highlight.js, but I can't really find anything to help with this case.
It's too bad there isn't a way to effectively highlight the text, copy it and then do a paste. That's how I fix the problems on medium articles (highlight the code block and copy it to markdown). I don't care about the code highlighting in the article. I can just mark the block with the code type and Obsidian does its own highlighting.
I've thought about possible solutions and I can see it's likely a difficult problem.
Sadly, medium is one of my primary sources and so I see the code block problem a lot since a majority of what I clip are programming articles. Wish I could say it wasn't a big deal, but, for me, it is.
On Sun, Dec 10, 2023 at 4:29 PM Gordon Pedersen @.***> wrote:
It looks like Medium code blocks are just terrible... Even without syntax highlighting applied, they're wrapped in a
(no
in sight) and use
for newlines (therefore negating the whole point of even using a). Also, everything has random bundle-generated class names. In short, it's just horrendous in terms of machine readability. I couldn't follow the original example, because Medium wants me to pay, but I did find a simple example here: https://blog.medium.com/code-blocks-with-syntax-highlighting-53343df53c4f in which this: image.png (view on web) https://github.com/deathau/markdownload/assets/1421840/55e3e9d0-bb50-4dbf-88b3-019b33d5bfcd looks like this in the source:// highlighted code is easier to read
function newCodeBlock() {
return “jazzy!”;
}I can see from the classes that it's using highlight.js https://highlightjs.org/, but I can't really find anything to help with this case.
— Reply to this email directly, view it on GitHub https://github.com/deathau/markdownload/issues/272#issuecomment-1849124395, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA4V7SYFTTWH4KTVH7HHF5TYIZAXBAVCNFSM6AAAAABAOZBVYCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNBZGEZDIMZZGU . You are receiving this because you authored the thread.Message ID: @.***>
I took the example I gave above and simply pasted the entire article into an empty markdown file in obsidian. Here's what I got for the code block (raw text; note that there were no code block marks ``` around it.):
# THIS IS FOR THE BACKBONE ONLY
with gr.Blocks(theme='ParityError/Interstellar') as demo:
#TITLE SECTION
with gr.Row():
with gr.Column(scale=12):
gr.HTML #TITLE
gr.Markdown #TEXT and DESRIPTION
gr.Image #your image LOGO
# chat and parameters settings
with gr.Row():
with gr.Column(scale=4): #CHATBOT AND USER INPUT
chatbot = gr.Chatbot #Chatbot box
with gr.Row(): # ROW for user input and button
with gr.Column(scale=14):
msg = gr.Textbox # User input
submitBtn = gr.Button #Submit button
with gr.Column(min_width=50,scale=1): #PARAMETERS SECTION
with gr.Tab(label="Parameter Setting"):
gr.Markdown("# Parameters")
top_p = gr.Slider
temperature = gr.Slider
max_length_tokens = gr.Slider
rep_pen = gr.Slider
clear = gr.Button("🗑️ Clear All Messages", variant='secondary')
# HERE we have to create the fucntions to be called by the Push Buttons
def user(user_message, history):
def bot(history,t,p,m,r):
# Clicking the submitBtn will call the generation with Parameters in the slides
submitBtn.click #all parameters here
clear.click #actions to clear all
#MAIN CALL SECTION
demo.queue() #required to yield the streams from the text generation
demo.launch(inbrowser=True)
which renders as
Still a bit wonky (comments sometimes render as headers or lose the #), but acceptable. Even better, I was also able to select it, type three backticks, and get the correct code block (with comments, indents, etc.). I then marked the block as Python and got all of the highlighting. I could live with having to mark the code blocks if the text was there.
I have no idea where all of the transformations of the html->text take place (in chrome, in obsidian paste). I'm not a browser/UI guy.
As I said, probably not an easy fix. Sorry
Thanks for looking into this. From my understanding, Obsidian does the conversion on paste. It seems to be completely ignoring the <pre>
tag and therefore not treating it as a code block at all (but it's still keeping the spacing — interesting)
This extension is a little bit different as it runs the document through a "readability" engine first, which strips out unnecessary stuff like headers and footers, but also strips some of the classes and styling before it converts to Markdown.
To that end, if you select all of the text inside of a code block in Medium, then right-click -> Markdownload -> Copy selection as markdown, it seems to behave similarly to Obsidian. But if you select the code block itself (like would happen if you also select text around it), it keeps all the HTML.
I'll look into it further (and if anyone else has any ideas, let me know)
Mark
I generated the following templater template
<%*
const curFile = await app.workspace.activeLeaf.view.file; // get current file
let contents = await app.vault.read(curFile) // get file contents
contents = contents
.replaceAll(/<span.*?>/g, "")
.replaceAll("</span>", "")
.replaceAll("<br>", "\n")
.replaceAll("<strong>", "**")
. replaceAll("</strong>", "**")
.replaceAll("<em>", "*")
.replaceAll("</em>", "*")
.replaceAll("&", "%")
.replaceAll("<", "<")
.replaceAll(">", ">")
await app.vault.modify(curFile, contents); // *replace content with new content*
-%>
This seems to clean up a lot of the blocks. It obviously, has some limitations
It looks like there is a new version. I think there was an attempt to fix the issue with code blocks not rendering properly. However, I would say that the new solution is worse than what it used to do. Here's an example. Look at https://medium.com/mlearning-ai/powerhouse-in-your-pocket-how-tiny-llms-are-redefining-the-ai-landscape-fdf17718bc79.
Here's the first codeblock as it appears in the story
Here's how it appears in the markdown
I suspect that the point was to remove the surrounding "```" and let the markdown editor render it. Notice that you lost any comments.
The other bigger problem is that you lose whitespace. For example, from the same story,
gets turned into (here I've removed the block markers to let it render)
Without the css of the file, all of the lines start in column 0. Given the importance of white space in python, not very helpful.
I'm sure this can't be an easy problem to solve or it would have happened already. I preferred the old rendering as we didn't have all of the html tags.