OpenUserJS / OpenUserJS.org

The home of FOSS user scripts.
https://openuserjs.org/
GNU General Public License v3.0
857 stars 302 forks source link

Issue with Unicode Support in script Summary #678

Closed TimidScript closed 9 years ago

TimidScript commented 9 years ago

Some Unicode characters appear as gibberish in the "Summary" on the scripts page and listing. The summary is extracted from the script "description" metadata.

Example here: Pixiv++

If you look at the summary you get gibberish (���) both in FF and Chrome. If you extract the text from the metadata and insert it manually in the script description ("Script Info") it works fine.

Martii commented 9 years ago

This is currently a dependency of #285... will keep it in mind in my testing hopefully later this week when the code is migrated. There are some decodeURI/encodeURI issues in current node as well but I haven't rechecked that recently.

Thanks for the report.


Possible font issue as well won't know till I am at dev station to test more.

Martii commented 9 years ago

Let's see how GH with this browser handles the raw stored data of Ultimate Pixiv Script: Direct Links, Auto-Paging, Preview, IQDB/Danbooru, Filter/Sort using Bookmark,views,rating,total score. | Safe Search | plus more. Works best with "Pixiv++ Manga Viewer" and "Generic Image Viewer". 自動ページング|ポケベル|ロード次ページ|フィルター|並べ替え|注文|ダイレクトリンク...

Ultimate Pixiv Script: Direct Links, Auto-Paging, Preview, IQDB/Danbooru, Filter/Sort using Bookmark,views,rating,total score. | Safe Search | plus more. Works best with \"Pixiv++ Manga Viewer\" and \"Generic Image Viewer\". 自動ページング|ポケベ���|ロード次ページ|フィルター|並べ替え|注文|ダイレクトリンク


See some question marks in the same place as your "gibberish".... so this may be leaning towards a font issue which is browser and operating system dependent... will check some things at dev station later.


How odd GH shows the "gibberish" exactly as you reported in Write/Preview but not on final Update of comment where it turns to those weird question marks. ;)

Martii commented 9 years ago

You know I'm not really sure what is doing this yet... I (OUJS) forked your script temporarily and it worked fine... then I temporarily removed your @description on OUJS and of course put it back in... and now it seems fixered... it could have been a node hiccup at any point.

Anyhow... I'll add the mitigation to this, which includes you following up with this issue next time you sync from GH to OUJS, and reopen if it reappears with steps to reproduce... but direct on OUJS seems to work fine... I'll still recheck it out in #285.

Doesn't appear to be a font issue either.

Martii commented 9 years ago

@TimidScript I've also forked your repo temporarily on GH and imported your script into my account here and no error with the @description Unicode ... and blanked out the source and still no error with @description Unicode... my best educated guess is a node issue that was fixed at some point... above comment still pertains for steps to reproduce if you see it again.

TimidScript commented 9 years ago

Thank you

How odd GH shows the "gibberish" exactly as you reported in Write/Preview but not on final Update of comment where it turns to those weird question marks. ;)

I rarely use preview, that's why I love the edit button ^_^

Doesn't appear to be a font issue either.

No I didn't think that was the case.

Anyhow... I'll add the mitigation to this, which includes you following up with this issue next time you sync from GH to OUJS, and reopen if it reappears with steps to reproduce...

Just updated for testing and it seems to work fine.

Martii commented 9 years ago

@TimidScript This appears to be back again... what editor are you using? I'm going to comment out your @description and uncomment it.

Martii commented 9 years ago

@TimidScript That refixed it editing it on the site itself... I would sure like to know what's going on here. :\

TimidScript commented 9 years ago

I use Visual Studio and then copy and paste into GitHub. Works fine on GreasyFork

Martii commented 9 years ago

and paste into GitHub

So you use their online editor? I'll try to track using that methodology with what's going on... if OUJS site edit works fine then it's not #285 at all. I've gone over that routine with a fine toothed comb.

Things to recheck (this section will change during testing):

TimidScript commented 9 years ago

So you use their online editor?

Yes I do. Sometimes I change the code in the editor for small updates, like I did for the icon fix, and other times I just paste the whole code from my offline editor.

My GreasyFork account now resync with GitHub but through manual update, and I update the files over there every so often. I've never noticed any character errors, if that's any help.

Martii commented 9 years ago

I've been able to reproduce this now with GH's editor... it's somewhere near the webhook... still analyzing the snapshot output... so far it appears we are receiving UTF-8 from GH but we also use buffers and then convert it to a string... this might be the issue with node but I'll need to run some more tests after analyzing. The object I'm analyzing is huge too so this might take a little bit of time.

TimidScript commented 9 years ago

Glad to hear you could reproduce it.

Martii commented 9 years ago

Dumped output

This is what the request response currently shows on production (snipped):

<   httpVersionMajor: 1,
<   httpVersionMinor: 1,
<   httpVersion: '1.1',
<   complete: false,
<   headers: 
<    { 'content-security-policy': 'default-src \'none\'',
<      'x-xss-protection': '1; mode=block',
<      'x-frame-options': 'deny',
<      'x-content-type-options': 'nosniff',
<      'strict-transport-security': 'max-age=31536000',
<      etag: '"a27bc31a7cf263b94de07bfb2e15b7f1d45b61a2"',
<      'content-type': 'text/plain; charset=utf-8',
<      'cache-control': 'max-age=300',
<      'content-length': '8087',
<      'accept-ranges': 'bytes',
<      date: 'Mon, 05 Oct 2015 18:52:23 GMT',
<      via: '1.1 varnish',
<      connection: 'close',
<      'x-served-by': 'cache-jfk1020-JFK',
<      'x-cache': 'MISS',
<      'x-cache-hits': '0',
<      vary: 'Authorization,Accept-Encoding',
<      'access-control-allow-origin': '*',
<      expires: 'Mon, 05 Oct 2015 18:57:23 GMT',
<      'source-age': '0' },

This is what parseMeta currently shows on production (snipped and notice the three "commas" in edit on GH which yields hexagon question marks on GH display):

<   "description": [
<    {
<     "value": "Ultimate Pixiv Script: Direct Links, Auto-Paging, Preview, IQDB/Danbooru, Filter/Sort using Bookmark,views,rating,total score. | Safe Search | plus more. Works best with \"Pixiv++ Manga Viewer\" and \"Generic Image Viewer\". 自動ページング|���ケベル|ロード次ページ|フィルター|並べ替え|注文|ダイレクトリンク"
<    }
sizzlemctwizzle commented 9 years ago

We might want to use https://nodejs.org/api/string_decoder.html on line 344 of scriptStorage.js On Oct 5, 2015 2:37 PM, "Marti Martz" notifications@github.com wrote:

Dumped output

This is what the request response currently shows on production (snipped):

< httpVersionMajor: 1,< httpVersionMinor: 1,< httpVersion: '1.1',< complete: false,< headers: < { 'content-security-policy': 'default-src \'none\'',< 'x-xss-protection': '1; mode=block',< 'x-frame-options': 'deny',< 'x-content-type-options': 'nosniff',< 'strict-transport-security': 'max-age=31536000',< etag: '"a27bc31a7cf263b94de07bfb2e15b7f1d45b61a2"',< 'content-type': 'text/plain; charset=utf-8',< 'cache-control': 'max-age=300',< 'content-length': '8087',< 'accept-ranges': 'bytes',< date: 'Mon, 05 Oct 2015 18:52:23 GMT',< via: '1.1 varnish',< connection: 'close',< 'x-served-by': 'cache-jfk1020-JFK',< 'x-cache': 'MISS',< 'x-cache-hits': '0',< vary: 'Authorization,Accept-Encoding',< 'access-control-allow-origin': '*',< expires: 'Mon, 05 Oct 2015 18:57:23 GMT',< 'source-age': '0' },

This is what parseMeta currently shows on production (snipped and notice the three "commas" in edit on GH which yields hexagon question marks on GH display):

< "description": [ < { < "value": "Ultimate Pixiv Script: Direct Links, Auto-Paging, Preview, IQDB/Danbooru, Filter/Sort using Bookmark,views,rating,total score. | Safe Search | plus more. Works best with \"Pixiv++ Manga Viewer\" and \"Generic Image Viewer\". 自動ページング|���ケベル|ロード次ページ|フィルター|並べ替え|注文|ダイレクトリンク" < }

— Reply to this email directly or view it on GitHub https://github.com/OpenUserJs/OpenUserJS.org/issues/678#issuecomment-145645270 .

Martii commented 9 years ago

@sizzlemctwizzle I'm thinking since node strings were detected as UTF-16 for BOM checking we can try sending the Accept-Charset and see what GH does with that (reading up on this now). The buffer might be trying to convert UTF-8 then with using toString and keeps the string in UTF-8 garbled. GH isn't sending the uppercase UTF-8 either as the output shows... that could mess things up if there isn't a toLowerCase in node and our deps.

Martii commented 9 years ago

@sizzlemctwizzle All my tests failed with that test set.

We might want to use https://nodejs.org/api/string_decoder.html on line 344 of scriptStorage.js

Do have an issue here with StringDecoder... we aren't on 4.x yet... I'll have to see if the 0.12.x has it native too... otherwise this may be a possible large breaking change... been using 4.x in dev here with local pro as well with no issues yet but haven't tested our process manager with node issue 3035.

Martii commented 9 years ago

Here we go:

Martii commented 9 years ago

@sizzlemctwizzle That didn't appear to work as a single fix _(see currently here)_... I'll see if the .concats in other areas need this.

Martii commented 9 years ago

Newest dump (snipped and separated)

< > getMeta()
< >> Chunks loop 0
< // ==UserScript==
< // @name                [TS] Pixiv++
< // @namespace           TimidScript
< // @version             3.3.80a.010 Beta
< // @description         Ultimate Pixiv Script: Direct Links, Auto-Paging, Preview, IQDB/Danbooru, Filter/Sort using Bookmark,views,rating,total score. | Safe Search | plus more. Works best with "Pixiv++ Manga Viewer" and "Generic Image Viewer". 自動ページング|��

... already bugged here with hexagonal question marks.

Four total Chunks loop... all bugged

Final blocksContent:

< >> blocksContent
< { UserScript: '\n// @name                [TS] Pixiv++\n// @namespace           TimidScript\n// @version             3.3.80a.010 Beta\n// @description         Ultimate Pixiv Script: Direct Links, Auto-Paging, Preview, IQDB/Danbooru, Filter/Sort using Bookmark,views,rating,total score. | Safe Search | plus more. Works best with "Pixiv++ Manga Viewer" and "Generic Image Viewer". 自動ページング|���ケベル|ロード次ページ|フィルター|並べ替え|注文|ダイレクトリンク\n// @author              TimidScript\n// @homepageURL         https://openuserjs.org/users/TimidScript\n// @copyright           © 2014 TimidScript, All Rights Reserved.\n// @license             Creative Commons BY-NC-SA + Read Copyright inside the script\n// @include             http://www.pixiv.net/*\n// @exclude             http://www.pixiv.net/*mode=manga&illust_id*\n// @exclude             http://www.pixiv.net/*mode=big&illust_id*\n// @exclude             http://www.pixiv.net/*mode=manga_big*\n// @require\t\t        https://openuserjs.org/src/libs/TimidScript/TSL_-_Generic.js\n// @require             https://openuserjs.org/src/libs/TimidScript/TSL_-_GM_Update.js\n// @homeURL             https://openuserjs.org/scripts/TimidScript/[TS]_Pixiv++\n// @grant               GM_info\n// @grant               GM_getMetadata\n// @grant               GM_registerMenuCommand\n// @grant               GM_getValue\n// @grant               GM_setValue\n// @grant               GM_listValues\n// @grant               GM_deleteValue\n// @grant               GM_xmlhttpRequest\n// @icon                data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAADIAAAAyCAIAAACRXR/mAAAAAXNSR0IArs4c6QAAAARnQU1BAACxjwv8YQUAAAAZdEVYdFNvZnR3YXJlAEFkb2JlIEltYWdlUmVhZHlxyWU8AAAB/ElEQVRYR+2XMUpEMRCGvY0n8AKewAt4Ai/gARYruy2txM7CwlLcQsFiG0ELRQsREREbRYstLNaPTHwMs75NJrx9rJCfv1jzJsn3MpPkubI6GC2hK5bHFcvjiuVxxfK4YnncDdbG3njz4FJ7bffMxLhcjgXK8Pxh/Pg+bdHzx2R//ESY6ZjjEqzt45vb1684eRAE8J3evWF+fE6+44MgWrxwPiyAIIizTafA0bI+vDBhmDyCGOOCdk7uTcwc52Ixt84XcExsYma9dXitV+7o6sUEtDkLiyUxeaGqTEybSZ/um7lmaSxeUUbUo+djYdYsdgv6M+nGCSzNxHvLb+TCwroAclI5D4sFl4GEiRb5E3mxzIIlT7V5WHSWxFFb0iKDIi8W1jXQDNjmRBJlwZpNJ4OiAiydx2T3dMlzUjdFGkctwmrKFIFonhqnsbTjqEVYvF7svFRYOokgmqfG/WHpWyt5qPaExaaJPYOSJ2pPWDqDycLCfWBxSsVuQTl3/MKxzFWdLHZxIVbmJ0pzfYkye+FCLCTfxGRk9oKjogHSWw+58l6OZcSXKrWMdcpENC72oznOE16dmVgtvcWMZDm9QOJuSp6skU3tXv8hi1DOQinw/8fSpyIlRaZMQIfOxQIiEv2KHWdiOrQvib25YnlcsTyuWB5XLI8rVr4Hox+cjUxd3KUvOgAAAABJRU5ErkJggg==\n' }

blocks

< >> blocks
< {
<  "UserScript": {
<   "name": [
<    {
<     "value": "[TS] Pixiv++"
<    }
<   ],
<   "namespace": [
<    {
<     "value": "TimidScript"
<    }
<   ],
<   "version": [
<    {
<     "value": "3.3.80a.010 Beta"
<    }
<   ],
<   "description": [
<    {
<     "value": "Ultimate Pixiv Script: Direct Links, Auto-Paging, Preview, IQDB/Danbooru, Filter/Sort using Bookmark,views,rating,total score. | Safe Search | plus more. Works best with \"Pixiv++ Manga Viewer\" and \"Generic Image Viewer\". 自動ページング|���ケベル|ロード次ページ|フィルター|並べ替え|注文|ダイレクトリンク"
<    }
<   ],
<   "homepageURL": [
<    {
<     "value": "https://openuserjs.org/users/TimidScript"
<    }
<   ],
<   "copyright": [
<    {
<     "value": "© 2014 TimidScript, All Rights Reserved."
<    }
<   ],
<   "license": [
<    {
<     "value": "Creative Commons BY-NC-SA + Read Copyright inside the script"
<    }
<   ],
<   "include": [
<    {
<     "value": "http://www.pixiv.net/*"
<    }
<   ],
<   "exclude": [
<    {
<     "value": "http://www.pixiv.net/*mode=manga&illust_id*"
<    },
<    {
<     "value": "http://www.pixiv.net/*mode=big&illust_id*"
<    },
<    {
<     "value": "http://www.pixiv.net/*mode=manga_big*"
<    }
<   ],
<   "require": [
<    {
<     "value": "https://openuserjs.org/src/libs/TimidScript/TSL_-_Generic.js"
<    },
<    {
<     "value": "https://openuserjs.org/src/libs/TimidScript/TSL_-_GM_Update.js"
<    }
<   ],
<   "grant": [
<    {
<     "value": "GM_info"
<    },
<    {
<     "value": "GM_getMetadata"
<    },
<    {
<     "value": "GM_registerMenuCommand"
<    },
<    {
<     "value": "GM_getValue"
<    },
<    {
<     "value": "GM_setValue"
<    },
<    {
<     "value": "GM_listValues"
<    },
<    {
<     "value": "GM_deleteValue"
<    },
<    {
<     "value": "GM_xmlhttpRequest"
<    }
<   ],
<   "icon": [
<    {
<     "value": "data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAADIAAAAyCAIAAACRXR/mAAAAAXNSR0IArs4c6QAAAARnQU1BAACxjwv8YQUAAAAZdEVYdFNvZnR3YXJlAEFkb2JlIEltYWdlUmVhZHlxyWU8AAAB/ElEQVRYR+2XMUpEMRCGvY0n8AKewAt4Ai/gARYruy2txM7CwlLcQsFiG0ELRQsREREbRYstLNaPTHwMs75NJrx9rJCfv1jzJsn3MpPkubI6GC2hK5bHFcvjiuVxxfK4YnncDdbG3njz4FJ7bffMxLhcjgXK8Pxh/Pg+bdHzx2R//ESY6ZjjEqzt45vb1684eRAE8J3evWF+fE6+44MgWrxwPiyAIIizTafA0bI+vDBhmDyCGOOCdk7uTcwc52Ixt84XcExsYma9dXitV+7o6sUEtDkLiyUxeaGqTEybSZ/um7lmaSxeUUbUo+djYdYsdgv6M+nGCSzNxHvLb+TCwroAclI5D4sFl4GEiRb5E3mxzIIlT7V5WHSWxFFb0iKDIi8W1jXQDNjmRBJlwZpNJ4OiAiydx2T3dMlzUjdFGkctwmrKFIFonhqnsbTjqEVYvF7svFRYOokgmqfG/WHpWyt5qPaExaaJPYOSJ2pPWDqDycLCfWBxSsVuQTl3/MKxzFWdLHZxIVbmJ0pzfYkye+FCLCTfxGRk9oKjogHSWw+58l6OZcSXKrWMdcpENC72oznOE16dmVgtvcWMZDm9QOJuSp6skU3tXv8hi1DOQinw/8fSpyIlRaZMQIfOxQIiEv2KHWdiOrQvib25YnlcsTyuWB5XLI8rVr4Hox+cjUxd3KUvOgAAAABJRU5ErkJggg=="
<    }
<   ]
<  }
< }

... bug propagated.

I think for grins I'll try the StringDecoder with utf16 and see what that does.

Martii commented 9 years ago

heh tried utf16 direct edit on production and it didn't like that:

< > getMeta()
< string_decoder.js:24
<     throw new Error('Unknown encoding: ' + encoding);
<           ^
< Error: Unknown encoding: utf16
<     at assertEncoding (string_decoder.js:24:11)
<     at new exports.StringDecoder (string_decoder.js:38:3)
<     at Object.exports.getMeta (/home/oujs/OpenUserJS.org1/controllers/scriptStorage.js:351:17)
<     at /home/oujs/OpenUserJS.org1/libs/repoManager.js:122:23
<     at IncomingMessage.<anonymous> (/home/oujs/OpenUserJS.org1/libs/repoManager.js:55:11)
<     at IncomingMessage.emit (events.js:129:20)
<     at _stream_readable.js:908:16
<     at process._tickDomainCallback (node.js:381:11)

Will need to rewrite getMeta a bit I think to not split and concat the blocksContent with our code... this is probably why StringDecoder accepts an Array of Buffers. With Buffers I assume it is Byte transfer and maybe @TimidScript has hit it right on the nose between chunks based off the previous dumps... then String creates the last Unicode character as junk... then the next Unicode character is junk too... thus a bugged String.

Will need a bit to do this again. Apologies for a lot of noise and ups/downs.

Martii commented 9 years ago

Uggh

< >> decoded str
< // ==UserScript==
< // @name                [TS] Pixiv++
< // @namespace           TimidScript
< // @version             3.3.80a.020 Beta
< // @description         Ultimate Pixiv Script: Direct Links, Auto-Paging, Preview, IQDB/Danbooru, Filter/Sort using Bookmark,views,rating,total score. | Safe Search | plus more. Works best with "Pixiv++ Manga Viewer" and "Generic Image Viewer". 自動ページング|��,�ケベル|ロード次ページ|フィルター|並べ替え|注文|ダイレクトリンク
< // @author              TimidScript

plus the webhook bombed on this... GRR!

Martii commented 9 years ago

Holy smokes... it finally worked in all methods with just the metadata block Unit Test of the target script.

@TimidScript Mind bumping your project version to confirm please?

Martii commented 9 years ago

@sizzlemctwizzle This methodology fixes the issue however it looks like it may have busted the chunks... e.g. I only get one console message saying that it is using all 9 chunks e.g. aChunks.length... I'm a bit tired from all of this and will reexamine this a bit more later with the OpenUserJS metadata block present. If you have any ideas on this please let me know.

Martii commented 9 years ago

Some notes out loud (comment may be subject to change):

Martii commented 9 years ago

I'm relatively sure this issue can be closed now... haven't gotten external confirmation yet but tons of testing done. Reopen if needed or just leave a comment and we'll hear ya. :)

Leaving needs mitigation on for a little while as that's how I do some followup.