dbrgn / pyxtra

A small commandline utility written in Python to access the (now dead) Swisscom Xtrazone SMS service
https://pypi.python.org/pypi/pyxtra
GNU General Public License v3.0
11 stars 2 forks source link

Add automatic captcha verification #1

Closed mhutter closed 13 years ago

mhutter commented 13 years ago

Add automatic captcha verification so one can use this tool in automated jobs etc.

Possible 3rd-party tools: Tesseract-OCR

(Yes, I thought about contributing (provided I can spare some time) but I'm a Python-noob ;-) )

petermanser commented 13 years ago

Good point, and yes we discussed about that. We try to find a solution, but it's not that easy. :)

dbrgn commented 13 years ago

I messed around a lot with Tesseract, but could not reach a 33% success ratio. Maybe I did something wrong. But I think the current solution is quite ok. But if someone wants to do captcha recognition - go on :)

And you can learn Python very quickly, especially in case you know Ruby :)

dbrgn commented 13 years ago

Das sieht recht interessant aus. In Kombination mit etwas Bildbearbeitung könnte man das Captcha ev. knacken. http://code.google.com/p/pytesser/

mhutter commented 13 years ago

The Idea, as far as I understood it:

  1. Convert the Image to a "simple", monochrome format to increase "readability" for OCR-Software (ie. with ImageMagick, or whatever tools are common in python)
  2. OCR-Magicz, win!

BTW: Startet dabbling in an Python-Tut today... looks pretty easy/cool!

dbrgn commented 13 years ago

mhutter, exactly, that's what i did with commandline tools (imagemagick and tesseract). and i never really succeeded.

i'll probably try again next week with pytesser. might work.

what also could help is a training file for tesseract. but i never really understood how to make those, because i never invested enough time in it. could also help to solve the problem.

luxflux commented 13 years ago

what about asciiart? so at least its not another window (not that cool with a tiling wm like aweseome).

mhutter commented 13 years ago

luxflux, not an option if you want to send messages automatically ;-)

dbrgn commented 13 years ago

mhutter, if you want reliable auto-sent messages, e.g. server warnings, you should not rely on xtrazone and use a sms service instead.

luxflux, i tried it, it only works if the window is large enough. maybe we could make it an option, but shouldn't be the default imo.

mhutter commented 13 years ago

gwrtheyrn I agree, but it would be a nice feature ;-)

luxflux commented 13 years ago

gwrtheyrn, why does it need a big window? i mean how big is your console window? i use 139x42 for pyxtra...

or make it optional?

petermanser commented 13 years ago

luxflux: 1 char != 1 pixel.. you need to scale in order to make the ascii captcha readable :)

luxflux commented 13 years ago

petermanser: this was colums x lines, not pixels :) or the other way around :D

dbrgn commented 13 years ago

luxflux: this is a pretty large ascii-captcha, easily readable:

http://imgur.com/Y96UN

and this is the same captcha at a smaller size:

http://imgur.com/AOIVe

as you can see, as the screen/window size gets smaller, it's not readable anymore. especially if the captcha image is larger than the one i used as an example.

boardend commented 13 years ago

char[][] ?

http://i.imgur.com/7N5rC.png

good n8 :)

dbrgn commented 13 years ago

@boardend: hm?

boardend commented 13 years ago

If you can put the Asciiart „Image“ into a multid. char-array, it should be easy to find the columns, which are empty (filled with #). So you can separate the chars and let the user solve ony by one. This solves the problem with the width of the shell.

Was just an idea, but could be a workaround, if the creation of the captcha-window fails?!

But finally, OCR ftw :-)

mhutter commented 13 years ago

Hm but keep in mind that sometimes letters in the Captcha-Image may be overlapping...

dbrgn commented 13 years ago

Ah, sounds like an interesting idea. But then it would be better to separate the letters before the ASCII-conversion.

I think either we should show the entire CAPTCHA as ASCII and simply require a sufficiently big resolution, or not do it at all.

But we'll give OCR another try next week.

Btw, http://en.wikipedia.org/wiki/Connected_component_labeling

dbrgn commented 13 years ago

@mhutter: They only overlap in about 5-10% of the cases. That's bearable. I think everything over a 33% success ratio is OK.

boardend commented 13 years ago

I think, the user should see when there are two chars at one?

petermanser commented 13 years ago

I'm not sure if it's not better to spend this effort into automatic reading of the captcha. :)

petermanser commented 13 years ago

automatically cracking captcha \o/, rewritten the login mechanism (closed by 5937edbcd9f265521810799256752f08ae3493fb)

luxflux commented 13 years ago

works for me, thx!

petermanser commented 13 years ago

awesome :)