angiemaunz / html5security

Automatically exported from code.google.com/p/html5security
1 stars 0 forks source link

JSON file format #4

Closed GoogleCodeExporter closed 8 years ago

GoogleCodeExporter commented 8 years ago
I'm sorry folks to come complaining on such a great project.

I've been trying to consume the JSON files to use in a project of mine (XSS 
vulns in a webmail interface), and I've hit a brick wall since.. your JSON 
files aren't at all valid JSON. And so, my ruby script fails to interpret them, 
and I've spent a day and a half trying to pre-parse them, 

I know they are valid javascript, but *it's not my fault the JSON guy (and all 
the JSON fanbois out there) is such an anal guy that he cannot bear single 
quote strings, or comments.*

Anyway, if your goal is to offer interop' and integration with whatever tool, 
you should move from valid javascript to valid JSON. I'd be glad to help you 
with the conversion (if I can pull it off) or even with more content if I my 
current interests lead me to one.

I'm again very sorry, and feel pretty lame, to open a ticket for such a stupid 
issue, but I think it will help others use the knowledge in your project to 
secure systems.

Thanks again,
Rob'

Original issue reported on code.google.com by rdelauge...@gmail.com on 21 Jul 2011 at 8:58

GoogleCodeExporter commented 8 years ago
[deleted comment]
GoogleCodeExporter commented 8 years ago
Haha - nice ticket, I lol'd :) But I see your point - still am not planning to 
change the format of the source file at the moment. What would make sense in my 
opinion are small interface files - enabling interop with PHP, Ruby or whatever 
you need, written in the language the interop is necessary for. Same could go 
for a REST API we could host on html5sec.org (thinking html5sec.org/api/php, 
html5sec.org/api/ruby etc.). 

Please let me know if you are interested in setting up what you specifically 
need - I'll most probably be glad to host it here or give you necessary commit 
privileges.

Cheers,
.mario

Original comment by Mario.He...@googlemail.com on 21 Jul 2011 at 10:00

GoogleCodeExporter commented 8 years ago
Yeah, but the problem is parsing "loosely valid JSON" into JSON is a 
<em>FUCKING</em> nightmare (well, to be fair, parsing JSON is a nightmare for 
the aforemetioned anal reasons). And even more so when the *things* inside 
single-quoted-strings are invalid HTML that is supposed to break parsers.. 
Well, I suppose you write your files directly into 
'not-JSON-but-the-thing-we-all-suppose-to-be-JSON', and not from a more 
strictly structured source where the change would be easy.
Then yes, if my work could be of some use for the community, I'd be glad to 
write those regexps from hell to make an adapter from your files to strict JSON 
format.

We'll keep in touch, I hope soon.
Rob'

Original comment by rdelauge...@gmail.com on 21 Jul 2011 at 10:29

GoogleCodeExporter commented 8 years ago
I'd just like to second this issue, the file is completely useless in its 
current format.  It needs rewritten before anything can even be attempted to be 
done with it.  The good news is that I've documented all of these issues so 
they can easily be fixed.

1.) Remove the /* */ comments
2.) Remove the "var items = " at the beginning
3.) Swap the " and ', JSON uses double quotes
4.) Remove the control characters.  JSON considers anything < 0x1f as control 
characters.  This includes things like 0x09 (tab characters)
5.) \xBC notation is not valid, it should be \u00BC.  Same for all other "\x.." 
patterns.
6.) \' is not valid in JSON.  These can safely be replaced with just a single 
quote.
7.) There are multiple places where the dictionaries have rogue commas at the 
end.  It's always the browser section and the IDs of these are 89, 99, 100, and 
102.

I'm including a small python script which addresses all of these issues except 
the rogue commas.  After manually fixing the rogue commas, I was to read in the 
file with the built-in JSON parser.  I'd like to stress that my script is not 
the best solution, but since I am not a committer, this is the best I can do.  
Hopefully the maintainers can use the script below to fix up the .json file and 
maintain the fixed version.  Or, if the current version is valuable to someone, 
rename the current file to be a .js file and then use this script to create a 
.json file in the build process.  That would let people who use Ruby, Python, 
Java, C++, PERL, or any other language to use the *real* JSON file while anyone 
who wants to use JS can use either one.

#
# This script will simply fix and load the json file
#
import json, re, string
# remove comments, this is JSON, not javascript
data = open('html5security.json').read()
data = re.sub(r'/\*.*?\*/', r'', data)
# remove the newlines so the regex will work properly
data = re.sub(r'\r?\n', '', data)
# strip everything outside the actual JSON data
get_array_only = re.compile(r'.*?(\[.*\]).*', re.MULTILINE)
data = get_array_only.sub(r'\1', data)
# swap ' for " and " for '
data = data.translate(string.maketrans("'\"", "\"'"))
# convert \xFF to \uFF
data = re.sub(r'\\x([0-9a-fA-F]{2})', r'\\u00\1', data)
# remove the control characters  
data = re.sub(r'[\x00-\x1f]*', r'', data)
# Json doesn't allow \' (only \")
data = re.sub(r"[^\\]\\'", r"'", data)
# Assuming the commas were fixed, we can now load the file in non-strict mode
j = json.loads(data)

Original comment by JoseLemm...@mail.com on 28 Sep 2011 at 11:13

GoogleCodeExporter commented 8 years ago
Well, I for one admire your courage for trying to regexp your way out of this 
problem. I tried to in ruby, but my (nonexistent) skills failed me. For the 
record, here is how I finally did it (when I noticed JSON, unlike XML, can have 
unicode chars in strings).

Since the js files are valid-js-but-not-valid-JSON, and that they actually 
assign variables, I just built an HTML file that loads the js, and I use the 
built-in JSON interpreter to convert it, and then copy-paste it into files. 
Lacks the automation, but works fine for me. Here is the barebones html file 
(works in all browsers but ie, one could replace textContent by innerText to 
make it work).

*********************BEGIN HTML FILE******************************
<html>
<head>
<title>Converter</title>
<style>
textarea{
width:800px;
height:200px;
}
</style>
 <script type="text/javascript" src="http://html5security.googlecode.com/svn/trunk/items.json"></script>
 <script type="text/javascript" src="http://html5security.googlecode.com/svn/trunk/categories.json"></script>
 <script type="text/javascript" src="http://html5security.googlecode.com/svn/trunk/payload.json"></script>
 <script type="text/javascript">
 function convert(){
 var i = JSON.stringify(items);
 var c = JSON.stringify(categories);
 var p = JSON.stringify(payloads);

 var divItems = document.getElementById("items");
 var divCategories = document.getElementById("categories");
 var divPayloads = document.getElementById("payloads");

 var d1=document.createElement("textarea");
 d1.textContent=i;
 divItems.appendChild(d1);
 var d2=document.createElement("textarea");
 d2.textContent=c;
 divCategories.appendChild(d2);
 var d3=document.createElement("textarea");
 d3.textContent=p;
 divPayloads.appendChild(d3);

 }
 </script>
 </head>

 <body>
 <div onclick="javascript:convert();">Click me!!1!one!</div>
 <div id="items"><h1>Items</h1></div>
 <div id="categories"><h1>Categories</h1></div>
 <div id="payloads"><h1>Payloads</h1></div>
 </body>
 </html>
***********************END HTML FILE*********************************
PS: Notice what I did? Safely injected a string into HTML... One wonders..

Rob'

Original comment by rdelauge...@gmail.com on 29 Sep 2011 at 5:12

GoogleCodeExporter commented 8 years ago
Format stays as it is. No further requests over the last n>6 months.

Original comment by Mario.He...@googlemail.com on 26 Jun 2012 at 7:08