BlackGlory / copycat

🌳 Copy content from web powerful than ever before.
https://chrome.google.com/webstore/detail/jdjbiojkklnaeoanimopafmnmhldejbg
MIT License
273 stars 36 forks source link

Copy as Markdown not escaping some characters #5

Closed q00u closed 6 years ago

q00u commented 6 years ago

If the text itself includes asterisks or angle-brackets, they aren't escaped into * or < >

So the resulting markdown doesn't match the original. That is to say, this italicized and censored internal thought: F*ck, what is going on? becomes: Fck, what is going on?* The asterisk in the text isn't escaped, so it's treated as markdown when it shouldn't be.

This looks like it might be a Turndown issue, as escaping was fixed in the recent 5.0.0 release, so if you're using a version older than that you'll have the same bugs.

BlackGlory commented 6 years ago

It's obviously confusing to ask an error about Markdown on a platform that supports Markdown, so I tried to display these in code syntax:

original HTML

<em>F*ck, what is going on?</em>

should be

*F\*ck, what is going on?*

not

*F*ck, what is going on?*

After my check, this is indeed a problem with Turndown. When I upgraded to Turndown 5.0.0, the problem was solved. Thank you for your reminder, please wait for me to release Copycat 2.4.1.

q00u commented 6 years ago

Still not escaping angle brackets.

Original: <<REBIRTH>>

Source: &lt;&lt;REBIRTH&gt;&gt;

Becomes: <\>

and

Original: <Shield of Sunset Light (Shield) (Set Equipment)>

Source: &lt;Shield of Sunset Light (Shield) (Set Equipment)&gt;

Becomes: ....nothing.

I did see a comment talking about more aggressive escaping rules (which included angle brackets) which didn't make it into the 5.0.0 release, so maybe it's possible with that custom rule?

Currently, if the formatted text includes &lt; and &gt;, it's misinterpreted as an HTML tag at some point along the way, and removed.

BlackGlory commented 6 years ago

Copycat will decode HTML character entities into ordinary characters in advance, which causes this problem. After I removed the code that decoded HTML entities, the problem was solved, it will be released as version 2.4.2.