colinta / SublimeStringEncode

Converts characters from one "encoding" to another using a transformation (think HTML entities, not character encodings)
Other
150 stars 22 forks source link

unnkown encoding results > a&#x308 #32

Closed katerlouis closed 8 years ago

katerlouis commented 8 years ago

I copy and paste text out of a pdf from my text guru. When I encode it I get a&#x308 instead of ä. The same goes for ü > u&#x308 and ö > o&#x308.

A ß encodes &szlig as expected, though.

What's going on! I the usual ä back. This new format is killing my webfont and replacing all the a&#x308 is a huge pain :(

colinta commented 8 years ago

Can you send me the text you're copying and trying to paste?

Does the encoding work when you paste the text (as is), and then select it and encode it (e.g. the "old fashioned way")?

colinta commented 8 years ago

I just tried it w/ über and it pasted über, which seems right to me... are you using the XML or HTML entities?

katerlouis commented 8 years ago

Wtf, that was fast :o I'm sure it is an issue of the Word export to pdf, or a Word setting. Your plugin isn't the problem; the default encoding function produces the same weird results. I have no other place to go! Google always deletes the a&#x308 :D – So I figured maybe it's a common issue you might know.

I do not follow your "old fashioned way", though. Here is a sample of the pdf I copy from.

paste-problem-sample.pdf

katerlouis commented 8 years ago

When I type by myself über inside the editor and encode it, I get ü aswell. But not when I encode text pasted out of the pdf.

"Where do I use XML or HTML entities?" – Sorry for my ignorance, but the deadline is so freakin close I lose my mind! :D

colinta commented 8 years ago

There are two "entitize" commands, one for XML and one for HTML. You're using the right one, I think.

I have an idea: paste the text somewhere in github (in a comment here) and then copy it FROM your browser. On my computer, at least, that fixes these characters.

colinta commented 8 years ago

Mitten in einer grünen Oase, am südlichen Berliner Stadtrand, befindet sich unsere Senioren-Residenz Lore Lipschitz. Umgeben von einem weitläufigen, barrierefreien, geschützten Garten mit gemütlichen Pavillons, können unsere Bewohner die sorglose Ruhe der Natur oder ein gemütliches Beisammensein an der frischen Luft genießen - ganz wie sie es ihnen beliebt.

Mitten in einer grünen Oase, am südlichen Berliner Stadtrand, befindet sich
unsere Senioren-Residenz Lore Lipschitz. Umgeben von einem weitläufigen,
barrierefreien, geschützten Garten mit gemütlichen Pavillons, können
unsere Bewohner die sorglose Ruhe der Natur oder ein gemütliches Beisammensein
an der frischen Luft genießen - ganz wie sie es ihnen beliebt.
colinta commented 8 years ago

When I copy/paste directly from the pdf, I get the same garbage you're talking about:

Mitten in einer grünen Oase, am südlichen
katerlouis commented 8 years ago

Sweet, isn't it? My google skillz are too low, I can't find shit about it :( – I'm desperate and time is kicking my balls. All I know: this pdf is produced by Word from Office 360 on a Windows 10 laptop. We played around with the exporting options, but.. I'm still here ;)

colinta commented 8 years ago

Did you try my idea? Copy all the text you need to convert and paste in here in github. You don't have to submit it, just click "Preview" and then copy the text again.

When you Paste Encoded… > Html entitize that text, it will have the correct HTML entities.

katerlouis commented 8 years ago

As you see there is this table layout in the PDF, copying all somewhere is far away from ideal. I have multiple files and stuff. Hm. There must be a setting in Word... some UTF8-encoding what ever shits :D

colinta commented 8 years ago

Yikes, you need to copy the table and have it converted into HTML? Ouch - good luck with that! This plugin doesn't do that.

Here's a video that might help, but it's just a demo of the copy/paste method I described:

http://media.colinta.com/tmp/entities.mov

katerlouis commented 8 years ago

Wait, so you are sure, that it has to do with the table?

katerlouis commented 8 years ago

WTF DID MY EYES JUST SEE? You implemented my idea with paste encoding? :D Why don't I have that already?!!

Back to topic: I'm not looking for a workaround; I guess in that case it would be faster to replace the whole project with sublime texts replace feature.

If you don't know how to fix the source of the problem I'll go with replacing :( Thanks! Very much.

And now tell me about the paste encoded thing :D

colinta commented 8 years ago

I have no idea what the source of the problem is - you have mentioned wordpress (which I never use), Office 360 (which I never use), and Windows (which I never use).

┐(゚~゚)┌

katerlouis commented 8 years ago

I meant Word :D and corrected it immediately. But you are just too fast. I am working on beautiful Mac and hate Windows and Word even more now ;) Thanks my friend!

But seriously. Where is this "paste encoded function" and what is the sublime-function-name so I can bind it! :D

katerlouis commented 8 years ago

I just saw your reply on my issue.. damn whats going on with me :D My brain is mudged.

How can I update inside of Sublime Text without removing and reinstalling? And I'm curious how to see the current version of a plugin.

colinta commented 8 years ago

Are you using Package Control? It will auto-update your packages (daily, I think). To force an upgrade:

  1. Tools > Command Palette to view all the commands (you can search for "StringEncode" commands there, btw)
  2. Type PC Upgrade (to filter to "Package Control: Upgrade") and then StringEncode to upgrade
  3. or type PC List to show the version numbers