Open Gusman10000 opened 4 years ago
Interesting. I'll need a bit of time to get my own FB data to test this with. Thanks for the report and details though, it makes this a lot easier to approach.
Might be able to take a look at this. Going to try it
Did a little reading and found this method of encoding / decoding utf8 in js.
I'm not really sure of what I'm doing in js, but I tried adding the decode code in the math.js facebook import function in a few spots and had success with having the emoji's showing up by changing:
'BODY': msg.content
to
'BODY': decodeURIComponent(escape(msg.content))
This seems to work as the emoji's now appear to register (I get the emoji map and they appear in the word use difference part).
That said I do get numbers and common symbols showing up in the word use difference, but they're things like 6, 10, *, &, 6:30, etc. Are any of these meant to be filtered out of this? If not then I think this change gets it working
Hey Gusman, I added your fix in https://github.com/BryceStevenWilley/visioning_texts/commit/167962724fe92d24c89ddac8b28eb0048ee96fab, thanks for the help! I'll double check that it works with my FB info, and close this issue when it does.
And at the moment, yeah, common numbers and symbols aren't filtered from the word difference. That's being tracked in #10.
Works for me, I've got some emoji's in the emoji count!
Hi guys, this method doesn't work for all emojis for example.
'\u00f3\u00be\u008c\u00ac\u00f3\u00be\u008c\u00ac\u00f3\u00be\u008c\u00a7'
Hence why i didn't bring it up earlier.
'\u00f3\u00be\u008c\u00ac\u00f3\u00be\u008c\u00ac\u00f3\u00be\u008c\u00a7'
Took me too long to figure out that this is 😘😘😍. Sorry for closing too soon @htkcodes.
Describe the bug Emoji's don't appear to be imported properly when importing a FB .json Message file, instead appearing as other odd unicode symbols
To Reproduce
Expected behavior .json file imported completely with all symbols being properly identified
Screenshots I've never sent this odd symbol (2nd down) in Messenger in my life. "It'd" has also been converted weirdly here too:
Desktop:
Additional context Yesterday I was writing a parser in python for these .json files to convert them into a WhatsApp text file and I ran into this exact problem. Initially the code would convert the first byte of an emoji and ignore the rest.
In Python I found the fix for this would be:
def fixup_string(text): return text.encode('latin1').decode('utf8')
I'm not well versed in js, so I'm not sure what the translation would be. The screenshot below shows a simple example using content from a message I pulled from my .json of the issue in Python, as well as the solution: