Open zaranmd opened 6 years ago
I agree with your point. I also want to implement a login detection. I already build a proof of concept, but I started on another project. When I will finish it, I would like to build a bigger proof of concept but it will probably be only in 1-2 months. If you develop something, let me know ;)
Hi @zaranmd,
This is a gist link to my proof of concept: https://gist.github.com/MrL3X/7b3580087cc18e90ddcb34b7bc52efe7
I did my proof of concept with Scrapy framework, the script that I sent you is only the main script of the project. To test it, you will need to create a new Scrapy project and copy paste the code that I sent you. To test your links, you need to add the URL in start_urls. The proof of concept wasn't set up with tor (not yet). Normally, I used this command to launch the Scrapy project: scrapy crawl login_detection -o test.json
. After the scraping, you will have a file named text.json and you will see if the links had a login detection. I use the input type password to detect if the page had a login form. I found that it was the easiest way to detect it. If you start something, let me know because it interests me and we could do it together.
We can discuss on the issue channel, it will be easier and other people can join us to develop it.
hi again @MrL3X , Do you have any idea about adding another existing python package to fresh onion project? I found below links but i don't know how i can add them to this project or how to run them! https://github.com/TeamHG-Memex/autoregister https://github.com/TeamHG-Memex/autologin https://github.com/TeamHG-Memex/autologin-middleware
Sorry for the delay, I know how to add another python package and I found the links really interesting. I would like to implement this: https://github.com/TeamHG-Memex/autologin-middleware to the project. I think that it was developed for scrapy specifically.
This is a a Scrapy middleware that uses autologin http-api to maintain a logged-in state for a scrapy spider.
This is an interesting link that combines login detection and captcha bypass #21 : http://berlusp44zaqyg2e.onion/?c=users&a=login
hi all... as we know, there are many onion domains that has a login/register page to enter and crawl their pages. Such domains require user name, password and captcha... I have run freshonions-torscraper and it seems that it doesn't crawl such domain's contents and we see just index page. I am interested to know how can i crawl such domains by entering into them! i have searched a bit, and i saw something like scrapy's FormRequest object... but i don't know what can i do with it in this project! Do you have any idea about my issue?! please guide me...