OpenUserJS / OpenUserJS.org

The home of FOSS user scripts.
https://openuserjs.org/
GNU General Public License v3.0
857 stars 305 forks source link

Fix UTF-8 Support #200

Open sizzlemctwizzle opened 10 years ago

sizzlemctwizzle commented 10 years ago

@OpenUserJs/admin @OpenUserJs/backend @OpenUserJs/frontend Related to #198 and a few other bugs, I just wanted to consolidate identifying and fixing a type of bug that might be common inside of OUJS. UTF-8 characters sometimes break things in production. It doesn't happen in development for some reason. It mostly has to do with encoding going wrong (or the wrong encoding function being used) during a redirect, but it might happen it other areas that have yet to be uncovered.

So far the errors I've been able to verify in production are:

@Martii Were there any others? Any other script manipulation I tried in production seems to work.

This is a blocking bug. I will not accept any unrelated PRs, except bug fixes, until this issue is resolved. Update: All issues that been discovered have been fixed, but there may be lingering undiscovered bugs so I'm leaving this issue open until we're confident this has been resolved.

Your mission (should you choose to accept it): Use UTF-8 characters in production in any way you can possibly imagine, and report back here with your results in the comments. This way we don't waste effort testing something twice and can identify the problem areas that need to be fixed.

Martii commented 10 years ago

Were there any others?

Not that I have seen yet. I could delete my affected scripts but not show the source or edit the source on production.

  • Import using GitHub (UTF-8 in GitHub path)
  • Edit script description (UTF-8 in scriptname)

Most notably on the POST routines... e.g. they do change the data store on production but come back with ISO 8859-1 string redirects on dev (which means they work as expected but maybe not correctly) and UTF-8 string redirects on production... which is very peculiar.

See also:

sizzlemctwizzle commented 10 years ago

but not show the source or edit the source on production.

Could you give me a url? I haven't been able to reproduce.

Most notably on the POST routines...

Wait a minute... Since they're redirections and not views, we must not be setting the character set correctly.

Martii commented 10 years ago

I haven't been able to reproduce.

I've subsequently deleted my affected scripts... on resync the issue of being non-editable isn't present but the 400 is... I didn't want orphaned scripts on OUJS currently if a fix shows up... I can infer that we were/are storing in ISO 8859-1 instead of UTF-8 in the DB I think. If you pick my Hello World script (only one in UserScripts repo) it should do it as of testing earlier this morning. I haven't retried it with your recent commits and possible deployment yet.

EDIT: You might try in jesus2099's account with his scripts _(specifically this one that had source before DB Migration)_ that have Unicode in them too. I can't view Source Code on a few of them which is how I detected which ones of mine to delete.

EDIT: CONFIRMED pro importing my Hello World script causes the 400 with new tab on current, as of just a moment ago, still.

Martii commented 10 years ago

btw production appears to be goofed up right now... with CSS at least. :\ This is challenging to find where things should be vs where they are. ;)

Zren commented 10 years ago

btw production appears to be goofed up right now... with CSS at least. :\

Getting

Resource interpreted as Stylesheet but transferred with MIME type text/plain: "https://openuserjs.org/css/common.css". openuserjs.org/:15 Resource interpreted as Stylesheet but transferred with MIME type text/plain: "https://openuserjs.org/css/bootstrap-custom.css". openuserjs.org/:14 Resource interpreted as Stylesheet but transferred with MIME type text/plain: "https://openuserjs.org/css/github.css".

I left out type="text/css" in the <link> tags, but it was working without them so I've got no idea why it just broke.

sizzlemctwizzle commented 10 years ago

so I've got no idea why it just broke.

I was trying something out in production. Reverted. Should be fixed now.

Martii commented 10 years ago

I left out type="text/css" in the <link> tags

Later those should probably go back in from where they left. MIME types are important to maintain with client side.

Reverted. Should be fixed now.

It is. :) Thanks.

sizzlemctwizzle commented 10 years ago

I've identified 7 scripts that are currently in production but can't be installed because of a fuckup in the AWS code I used to get rid of url namespaces.

This issue will not occur going forward. It is isolated to these scripts. I should be able to fix this soon.

Martii commented 10 years ago

There's also may be some on dev that I saw last night in my account too. Those are totally orphaned from what I can tell... e.g. can't delete them because no homepage access. I haven't check today with current HEAD yet.

from my dev script listings page. These were orphaned during some of the past commit fixes including dev DB migration from your discussion.

sizzlemctwizzle commented 10 years ago

You're pretty much on your own in dev. I might just wipe out all existing script data.

Martii commented 10 years ago

I might just wipe out all existing script data.

LOL well that works too. +1

Until I get a little more involved with how to tweak the dev DB any help is always appreciated. :P :) Btw I sort of expected this to happen and was wondering if pro would do the same thing but didn't try it during migration assuming that was possible. ;)

sizzlemctwizzle commented 10 years ago

Well manually fixed these above scripts I identified along with the first three the migration broke. Please let me know if anyone runs across any others.

LOL well that works too. +1

Okay, I'll do that soon.

if pro would do the same thing but didn't try it during migration assuming that was possible.

It was. There was a small amount of window between the data migration (preformed locally) and the deployment of the new code that could handle it, where everything would have been broken. But I was fast, and it happened late at night if I remember correctly.

Martii commented 10 years ago

Now this is unusual:

currently resolves to:

Martii commented 10 years ago

Use UTF-8 characters in production in any way you can possibly imagine, and report back here with your results in the comments.

and "needs testing" label:

The redirect issue still present for Unicode of course.

sizzlemctwizzle commented 10 years ago

Okay my latest commit seems to have fixed the redirect issue, as far as I can tell.

currently resolves to:

I can't confirm. The link works correctly for me.

Martii commented 10 years ago

I can't confirm.

  • https://openuserjs.org/scripts/Kelseydehaan/iBOOD_Autosubmit_to_sofort%C3%9Cberweisung_%28DE%29%28no_longer_PayPal%29 is the "real" URI (encoded from a Fx addy bar copy)
  • https://openuserjs.org/scripts/Kelseydehaan/iBOOD_Autosubmit_to_sofortÜberweisung_(DE)(no_longer_PayPal) (decoded)
  • https://openuserjs.org/scripts/Kelseydehaan/iBOOD_Autosubmit_to_sofortÜberweisung_(DE)(no_longer_PayPal) is what you posted above as a plain URL.
  • https://openuserjs.org/scripts/Kelseydehaan/iBOOD_Autosubmit_to_sofort%C3%9Cberweisung_(DE)(no_longer_PayPal) Firebug grab of the GH href .... appears that GH may have some issues too with their automatic linking or perhaps Fx via address bar. hmmm.

Mozilla/5.0 (X11; Linux x86_64; rv:30.0) Gecko/20100101 Firefox/30.0 - Firefox Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/35.0.1916.153 Safari/537.36 - Chromium Opera/9.80 (X11; Linux x86_64) Presto/2.12.388 Version/12.16 - Opera Mozilla/5.0 (X11; Linux x86_64; rv:29.0) Gecko/20100101 Firefox/29.0 SeaMonkey/2.26 - SeaMonkey


However Win7 does appear to work with your link... reconfirming on that machine directly... CONFIRMED however works with mine too... UTF-8 and encodings can be so much fun! :\

Mozilla/5.0 (Windows NT 6.1; WOW64; rv:30.0) Gecko/20100101 Firefox/30.0 - Firefox Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/35.0.1916.153 Safari/537.36 - Chrome Mozilla/5.0 (Windows NT 6.1; WOW64; Trident/7.0; rv:11.0) like Gecko - IE Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/35.0.1916.153 Safari/537.36 OPR/22.0.1471.70 - Opera Mozilla/5.0 (Windows NT 6.1; WOW64; rv:29.0) Gecko/20100101 Firefox/29.0 SeaMonkey/2.26 - SeaMonkey


redirect issue

Will check momentarily... Seems to be working. Awesome!

Since you are using encodeURI and I have submitted at least one pr using encodeURIComponent we might want to recheck these... encodeURI has these earmarked as reserved ; , / ? : @ & = + $ according to MDN. I'm fairly confident with usernames it should be as much as possible for @collaborator... the autolinking for @supportURL and the @homepageURL I'm only decoding with decodeURI on the text value and url value remains unchanged from script source.

Martii commented 10 years ago

Okay @sizzlemctwizzle the full set of my browsers is in for address bar copying testing... there doesn't seem to be any pattern to the browser madness other than symmetry with who doesn't do what... Think I need to look up the ES specs (vague as usual) and see what it is supposed to be. (See also IETF) There's a definite difference on address bar copying between all of them... and some issues between Nix browsers with accessing and not accessing your originally linked url... I think this one might be a "wait on all of them to sort it out"...kind of thing.

EDIT: And now my url nor yours doesn't work with Fx here (Linux), even on a paste, even though I just tested this out on 9 different browsers ... GRR! I don't like bugs like this... impedes testing here.

EDIT: Tried in a squeeky clean Fx profile and now my link works again and so does yours... bleh Add-On compatibility issue... scratch that report for now. I'll retest after a while and a few updates including from other key AO devs including a potential candidate of NoScript since it is saying it is filtering an XSS attack with these now.

EDIT: Looking at the percent encoding spec, under "sub-delims" on IETF, parenthesis are reserved so they should always be encoded when used in a filename... so Mozilla is the only one that seems to have got it right. :) The rest including GH's current autolinking is incorrect according to those specs.

Martii commented 10 years ago

I'm going to start doing the uppercase UTF-8 values in all files... currently all PR READY shouldn't involve those lines... currently ignoring ./public/js since those appear to be locally hosted node packages.

Martii commented 10 years ago

Everything appears to be okay but we should keep this issue open but you remove the blocking unless someone has an objection. You tested out comments and those appear to read okay. I'll notate anything if I find anything else but I don't think I will. hopefully

sizzlemctwizzle commented 10 years ago

Updated issue description and removed the blocking label.

cvzi commented 10 years ago

I found an error with a UTF-8 file (and I hope this is the right place to report it) I tried to update one of my script via Upload Script on https://openuserjs.org/user/add/scripts After the file was upload I got a 302 redirect back to https://openuserjs.org/user/add/scripts instead of the source code being updated. I think that the problem was the BOM which was added automatically by Windows' notepad.exe when I edited the file.

Martii commented 10 years ago

Tested BOM at https://github.com/Martii/UserScripts/blob/master/src/RFC%202606%C2%A73/BOM%20test/RFC%202606%C2%A73%20-%20BOM%20test.user.js

Notepad isn't known for adding BOMs but it's also gone through a lot of changes. Notepad++ on the other hand is how this one was crafted by changing to Encode with UTF-8 (which includes the BOM and verified in a hex editor starting file with EF BB BF)

Any chance that you can upload that to a GitHub (GH) repository so I can test that file instead of one that I created? Don't do a gist as I'm not sure that will maintain the BOM at all.

And yes this is the correct place. :)

cvzi commented 10 years ago

I tried to upload your testfile to oujs.org and it worked. Maybe my files are corrupt in another way.

I overwrote the original file, but I was able to reproduce the error. I've done this with Windows 7. Create a new file with Notepad++ and set Encoding to UTF8 without BOM. Add at least one 2-Byte character (I used a German umlaut ö) and save the file. Now open the file with notepad.exe and just save it. And now the file has the EF BB BF Bytes. And I cannot upload it to OJUS.

Here is my test file: https://github.com/cvzi/Userscripts/tree/master/UTF8BOM_test When I try to upload UTF8 BOM test.user.js on OUJS I get the behavior as mentioned before. When I try to import it via OUJS GitHub import I get this error page: 400 | Specified file does not contain a userscript header.

Martii commented 10 years ago

I overwrote the original file, but I was able to reproduce the error. I've done this with Windows 7. Create a new file with Notepad++ and set Encoding to UTF8 without BOM. Add at least one 2-Byte character (I used a German umlaut ö) and save the file. Now open the file with notepad.exe and just save it. And now the file has the EF BB BF Bytes. And I cannot upload it to OJUS.

@cvzi I've confirmed it with your file, including hex editor check, and have a PR that will probably get merged tomorrow. Needs review time for a little while then we'll get you taken care of. Thank you very much for your report. :)