dirtyfilthy / freshonions-torscraper

Fresh Onions is an open source TOR spider / hidden service onion crawler hosted at zlal32teyptf4tvi.onion
GNU Affero General Public License v3.0
505 stars 148 forks source link

automatic login and register #19

Open zaranmd opened 6 years ago

zaranmd commented 6 years ago

hi all... as we know, there are many onion domains that has a login/register page to enter and crawl their pages. Such domains require user name, password and captcha... I have run freshonions-torscraper and it seems that it doesn't crawl such domain's contents and we see just index page. I am interested to know how can i crawl such domains by entering into them! i have searched a bit, and i saw something like scrapy's FormRequest object... but i don't know what can i do with it in this project! Do you have any idea about my issue?! please guide me...

screenshot from 2018-03-04 13-18-18

L3houx commented 6 years ago

I agree with your point. I also want to implement a login detection. I already build a proof of concept, but I started on another project. When I will finish it, I would like to build a bigger proof of concept but it will probably be only in 1-2 months. If you develop something, let me know ;)

L3houx commented 6 years ago

Hi @zaranmd,

This is a gist link to my proof of concept: https://gist.github.com/MrL3X/7b3580087cc18e90ddcb34b7bc52efe7

I did my proof of concept with Scrapy framework, the script that I sent you is only the main script of the project. To test it, you will need to create a new Scrapy project and copy paste the code that I sent you. To test your links, you need to add the URL in start_urls. The proof of concept wasn't set up with tor (not yet). Normally, I used this command to launch the Scrapy project: scrapy crawl login_detection -o test.json. After the scraping, you will have a file named text.json and you will see if the links had a login detection. I use the input type password to detect if the page had a login form. I found that it was the easiest way to detect it. If you start something, let me know because it interests me and we could do it together.

We can discuss on the issue channel, it will be easier and other people can join us to develop it.

davisbra commented 6 years ago

hi again @MrL3X , Do you have any idea about adding another existing python package to fresh onion project? I found below links but i don't know how i can add them to this project or how to run them! https://github.com/TeamHG-Memex/autoregister https://github.com/TeamHG-Memex/autologin https://github.com/TeamHG-Memex/autologin-middleware

L3houx commented 6 years ago

Sorry for the delay, I know how to add another python package and I found the links really interesting. I would like to implement this: https://github.com/TeamHG-Memex/autologin-middleware to the project. I think that it was developed for scrapy specifically.

This is a a Scrapy middleware that uses autologin http-api to maintain a logged-in state for a scrapy spider.

L3houx commented 6 years ago

This is an interesting link that combines login detection and captcha bypass #21 : http://berlusp44zaqyg2e.onion/?c=users&a=login