CIRCL / AIL-framework

AIL framework - Analysis Information Leak framework. Project moved to https://github.com/ail-project
https://github.com/ail-project/ail-framework
GNU Affero General Public License v3.0
1.29k stars 283 forks source link

Infinite onion sites getting crawled #514

Closed annetteshajan closed 3 years ago

annetteshajan commented 4 years ago

I manually added a few onion sites and I think recursively they kept calling many more sites and now there are too many sites in the queue. Any method to remove the sites from the queue? @Terrtia

Terrtia commented 4 years ago

HI @annetteshajan !

This is how AIL can discover new onions, a onion site (if not crawled manually or periodically) can only be crawled/checked once by month. This discovery mode have a lower priority than manual or periodical crawler.

The size of the queue isn't an issue if want to crawl manually or periodically a website or an onion address.

Does it answer your question ?

annetteshajan commented 4 years ago

So there's no way to control what's in the queue I guess.. @Terrtia

I had another doubt as well, don't know whether to raise another issue, but I wanted to know how do you store the tagged content? And is there any way I can extract it to ELK for instance?

mokaddem commented 3 years ago

@annetteshajan The taggs are stored in ARDB (persistent storage). The easiest way to push data to another tool would be to use the API. There are other alternatives that requires a bit of programming like streaming any new tag to a specific queue but I'm not sure @Terrtia has the time to work on it.