dirtyfilthy / freshonions-torscraper

Fresh Onions is an open source TOR spider / hidden service onion crawler hosted at zlal32teyptf4tvi.onion
GNU Affero General Public License v3.0
508 stars 147 forks source link

a problem in running some scripts #7

Open zaranmd opened 7 years ago

zaranmd commented 7 years ago

hi there... I run some scripts of this project successfully but there is a problem with some scripts. by default I disable the elastic-search and I push a .onion domain with push.sh, and I see the results in domain and page tables,but after that, when I run scraper-service.sh face with a loop like below: screenshot from 2017-10-15 12-57-50 screenshot from 2017-10-15 12-59-27

As you see a part of my terminal during running scraper-service.sh, scrapy opens and closes again and again and it can not get out from this loop by itself... the second problem is with elastic-search! When i enable it, none of the scripts run and i have this error: screenshot from 2017-10-15 13-48-50 screenshot from 2017-10-15 13-51-04

Would you plz guide me... and if you need some files, i will show you...thanks...

L3houx commented 6 years ago

I think that you had a problem in your initial configuration, if you want to restart your project and try without elasticsearch, you can try with an updated Readme : https://github.com/GoSecure/freshonions-torscraper/blob/update-readme/README.md

davisbra commented 6 years ago

hi @MrL3X , thanks for your guides... I run the project successfully, but some tables in my db are empty including category, category_link,headless bot, open_port, web component and web component link! I know it is because of something about codes and scripts that i don't know what should i do! Also, I think some .sh and .py files are missed in this clone such as corpus.py and etc. Please check the project's files and guide me to complete the run... thanks...

L3houx commented 6 years ago

Hi @davisbra , If you cloned the project you supposed to have all files that you need to compile the project. I didn't miss any files, the project ran perfectly. I think that the maintainer updated the database schema to add more functionalities but he decided to stop them, so you may have the beginning of functionalities never finished. It's only a theory. Like I said before, my project works well without these tables and I didn't know if these tables are useful or not. if you had followed the readme file that I sent you, you supposed to be all good. How many onions do you have? How many of them are valid (alive/green)?

davisbra commented 6 years ago

Hi @MrL3X , I checked the commits and i saw some files (like autocategorise folder and corpus.py in it) which i didn't have in cloned file! As i said, the project ran without them, but some tables are empty ... there are about30,000 domains in which about 7000 of them are alive... Are these result ok? I have another question, do you have any idea about forum spidering? This crawler works well, but it seems that it can not enter private forums which requires login or register!

L3houx commented 6 years ago

Hi @davisbra,

We create a channel about forum spidering: https://github.com/dirtyfilthy/freshonions-torscraper/issues/19

I think the ratio alive/down is correct. I didn't see the others files that can be missing. Also, I didn't take care of the empty tables I didn't find that the most important thing, but it can be a good point to check this in the future.