PuerkitoBio / purell

tiny Go library to normalize URLs
BSD 3-Clause "New" or "Revised" License
473 stars 58 forks source link

more tests, and some bugs #1

Closed jehiah closed 12 years ago

jehiah commented 12 years ago

I've been interested in URL normalization for quite some time (as i should be since i work at bitly). As a result I've written code to do this in python, tcl, and written various incomplete bits of code that dreams of doing this right. So, i know where you are coming from.

As i've been using golang more recently, i was happy to see your project, and hope that this will be one less thing that i'll need to do. To that end, i'm contributing test cases that i've found useful in helping improve my python library.

It's not all that many test cases, but it highlights a few things I would love to see

beyond that, there are a few cases I think highlight bugs.

Additionally, while i appreciate the configurability of some of the flags, It would be nice to se one function that just does the "right" thing up until some of the more editorial items (like dropping www, trailing slash, etc). It might just be me, but as a user that's what I'd like to see.

It might be odd to open a pull request that just adds failing tests, but it's all i have time for tonight. when i get around to needing this in a project, i'll undoubtedly open up additional pull requests with some fixes.

(also, i'm lazy, so i added a .travis.yml file and a badge to the readme; you should be able to go to travis-ci.org and turn on automated tests)

jehiah commented 12 years ago

here is the output the new tests highlight

$ go test
2012/09/19 22:49:07 expected: http://s.qé.de/ got: http://s.xn--q-bga.de/
2012/09/19 22:49:07 expected: http://www.thedraymin.co.uk/main/?p=308 got: http://www.thedraymin.co.uk:/main/?p=308
2012/09/19 22:49:07 expected: http://test.example/ got: http://test.example
2012/09/19 22:49:07 expected: http://www.foo.com/foo/bar.html got: http://www.foo.com./foo/bar.html
2012/09/19 22:49:07 expected: http://test.domain/Iñtërnâtiôn�lizætiøn got: http://test.domain/I%C3%B1t%C3%ABrn%C3%A2ti%C3%B4n%EF%BF%BDliz%C3%A6ti%C3%B8n
2012/09/19 22:49:07 expected: http://66.102.7.147/ got: http://1113982867/
2012/09/19 22:49:07 expected: http://test.example/キ got: http://test.example/%E3%82%AD
2012/09/19 22:49:07 expected: http://www.foo.com:81/foo got: http://www.foo.com.:81/foo
2012/09/19 22:49:07 expected: http://test.example/?a=も%26 got: http://test.example/?a=%e3%82%82%26
2012/09/19 22:49:07 expected: https://[::ffff:192.168.1.1]/test got: https://[::ffff:192.168.1.1]:443/test
2012/09/19 22:49:07 expected: http://www.example.com/ got: http://www.example.com./
2012/09/19 22:49:07 expected: http://xblaのxbox.com/ got: http://xblaのxbox.com
2012/09/19 22:49:07 expected: http://ja.wikipedia.org/wiki/キャタピラージャパン got: http://ja.wikipedia.org/wiki/%E3%82%AD%E3%83%A3%E3%82%BF%E3%83%94%E3%83%A9%E3%83%BC%E3%82%B8%E3%83%A3%E3%83%91%E3%83%B3
2012/09/19 22:49:07 expected: http://qé.xblaのxbox.com got: http://xn--q-bga.xblaのxbox.com
--- FAIL: TestUrlnorm (0.00 seconds)
2012/09/19 22:49:07 expected: http://test.example/foo/ got: http://test.example/foo//
2012/09/19 22:49:07 expected: http://test.example/foo/bar/ got: http://test.example/foo///bar//
2012/09/19 22:49:07 expected: http://test.example/foo/ got: http://test.example/foo
2012/09/19 22:49:07 expected: http://test.example/foo/ got: http://test.example/foo
2012/09/19 22:49:07 expected: http://test.example/foo/bar/ got: http://test.example/foo/bar
--- FAIL: TestSlashes (0.00 seconds)
mna commented 12 years ago

Thanks, looks interesting, I'll give it a go tomorrow. I must admit my tests so far are pretty conservative (compared to yours!).

mna commented 12 years ago

Just as a FYI, I'll be working a bit on this today, so if you plan to send another pull request, make sure to refresh your fork. Thanks again, definitely some good test cases in your PR!