CIRCL / AIL-framework

AIL framework - Analysis Information Leak framework. Project moved to https://github.com/ail-project
https://github.com/ail-project/ail-framework
GNU Affero General Public License v3.0
1.29k stars 283 forks source link

Indexer.py #468

Closed Phil-ThePower-Pearce closed 5 months ago

Phil-ThePower-Pearce commented 4 years ago

Hi, Whilst running the scripts Ive noticed this in the indexer

Indexing - 1580809702 : archive/pastebin.com_pro/2020/02/11/FjQhcqFk.gz Indexing - 1580809702 : archive/gist.github.com/2020/02/11/sylr_b065e1fbd3de0c2ff095d83b969e6db4.gz Indexing - 1580809702 : archive/pastebin.com_pro/2020/02/11/UQcU4SKv.gz Indexing - 1580809702 : archive/pastebin.com_pro/2020/02/11/PhHjsK9A.gz Indexing - 1580809702 : archive/ideone.com/2020/02/11/qYcdfx.gz bash: line 1: 20994 Killed /home/ubuntu/Apps/AIL-framework//AILENV/bin/python ./Indexer.py

The indexer queue obviously stopped, Is there a way to restart a single screen/service again and redo the queue? The moldule information script doesnt work as reported previously.

Terrtia commented 4 years ago

Hi @Phil-ThePower-Pearce !

It seem like something on your server kill this script.

You can manually relaunch it:

Phil-ThePower-Pearce commented 4 years ago

How can I tell what killed it? As far as Im concerned its an aws ec2 t2.medium ubuntu 18.04 instance, up-to-date and only running ail + feeder and pystemon.
The indexer amount in the gui, just increases... decreases very very slowly

http://tinyurl.com/ttum4m6

In the indexder script

Traceback (most recent call last): File "./Indexer.py", line 134, in <module> content=paste) File "/home/ubuntu/Apps/AIL-framework/AILENV/lib/python3.6/site-packages/whoosh/writing.py", line 483, in update_document with self.searcher() as s: File "/home/ubuntu/Apps/AIL-framework/AILENV/lib/python3.6/site-packages/whoosh/writing.py", line 297, in searcher return Searcher(self.reader(), **kwargs) File "/home/ubuntu/Apps/AIL-framework/AILENV/lib/python3.6/site-packages/whoosh/writing.py", line 639, in reader self.generation, reuse=reuse) File "/home/ubuntu/Apps/AIL-framework/AILENV/lib/python3.6/site-packages/whoosh/index.py", line 535, in _reader readers = [segreader(segment) for segment in segments] File "/home/ubuntu/Apps/AIL-framework/AILENV/lib/python3.6/site-packages/whoosh/index.py", line 535, in <listcomp> readers = [segreader(segment) for segment in segments] File "/home/ubuntu/Apps/AIL-framework/AILENV/lib/python3.6/site-packages/whoosh/index.py", line 524, in segreader generation=generation) File "/home/ubuntu/Apps/AIL-framework/AILENV/lib/python3.6/site-packages/whoosh/reading.py", line 620, in __init__ self._terms = self._codec.terms_reader(self._storage, segment) File "/home/ubuntu/Apps/AIL-framework/AILENV/lib/python3.6/site-packages/whoosh/codec/whoosh3.py", line 122, in terms_reader postfile = segment.open_file(storage, self.POSTS_EXT) File "/home/ubuntu/Apps/AIL-framework/AILENV/lib/python3.6/site-packages/whoosh/codec/base.py", line 556, in open_file return storage.open_file(fname, **kwargs) File "/home/ubuntu/Apps/AIL-framework/AILENV/lib/python3.6/site-packages/whoosh/filedb/filestore.py", line 333, in open_file return self.a.open_file(name, *args, **kwargs) File "/home/ubuntu/Apps/AIL-framework/AILENV/lib/python3.6/site-packages/whoosh/filedb/compound.py", line 121, in open_file f = BufferFile(buf, name=name) File "/home/ubuntu/Apps/AIL-framework/AILENV/lib/python3.6/site-packages/whoosh/filedb/structfile.py", line 357, in __init__ self.file = BytesIO(buf) MemoryError

Phil-ThePower-Pearce commented 4 years ago

A new one today

Traceback (most recent call last): File "./Indexer.py", line 134, in <module> content=paste) File "/home/ubuntu/Apps/AIL-framework/AILENV/lib/python3.6/site-packages/whoosh/writing.py", line 483, in update_document with self.searcher() as s: File "/home/ubuntu/Apps/AIL-framework/AILENV/lib/python3.6/site-packages/whoosh/writing.py", line 297, in searcher return Searcher(self.reader(), **kwargs) File "/home/ubuntu/Apps/AIL-framework/AILENV/lib/python3.6/site-packages/whoosh/writing.py", line 639, in reader self.generation, reuse=reuse) File "/home/ubuntu/Apps/AIL-framework/AILENV/lib/python3.6/site-packages/whoosh/index.py", line 535, in _reader readers = [segreader(segment) for segment in segments] File "/home/ubuntu/Apps/AIL-framework/AILENV/lib/python3.6/site-packages/whoosh/index.py", line 535, in <listcomp> readers = [segreader(segment) for segment in segments] File "/home/ubuntu/Apps/AIL-framework/AILENV/lib/python3.6/site-packages/whoosh/index.py", line 524, in segreader generation=generation) File "/home/ubuntu/Apps/AIL-framework/AILENV/lib/python3.6/site-packages/whoosh/reading.py", line 620, in __init__ self._terms = self._codec.terms_reader(self._storage, segment) File "/home/ubuntu/Apps/AIL-framework/AILENV/lib/python3.6/site-packages/whoosh/codec/whoosh3.py", line 122, in terms_reader postfile = segment.open_file(storage, self.POSTS_EXT) File "/home/ubuntu/Apps/AIL-framework/AILENV/lib/python3.6/site-packages/whoosh/codec/base.py", line 556, in open_file return storage.open_file(fname, **kwargs) File "/home/ubuntu/Apps/AIL-framework/AILENV/lib/python3.6/site-packages/whoosh/filedb/filestore.py", line 333, in open_file return self.a.open_file(name, *args, **kwargs) File "/home/ubuntu/Apps/AIL-framework/AILENV/lib/python3.6/site-packages/whoosh/filedb/compound.py", line 121, in open_file f = BufferFile(buf, name=name) File "/home/ubuntu/Apps/AIL-framework/AILENV/lib/python3.6/site-packages/whoosh/filedb/structfile.py", line 357, in __init__ self.file = BytesIO(buf) MemoryError

Terrtia commented 4 years ago

This seem to be a memory issue. Are you processing large files ?

Phil-ThePower-Pearce commented 4 years ago

Im litteraly pulling the data from CIRCL feed, the indexer just keeps increasing and when it hits 3000+ the queue gets stuck, I look in the script, I see an error like above.

mokaddem commented 4 years ago

Hey, what are the specs of your system? How much memory is available?

Phil-ThePower-Pearce commented 4 years ago

aws ec2 t2.medium ubuntu 18.04 instance 2 vCPUs, 4Gb Memory

Only running AIL, and only importing feeds from CIRCL

Terrtia commented 4 years ago

It seem like the Indexer run out of memory. The minimum configuration is at least 2 CPUs and 8GB of memory.

Phil-ThePower-Pearce commented 4 years ago

Im rebuilding the ec2 instance with the above settings. Will retry

Phil-ThePower-Pearce commented 4 years ago

On your advice I created an ec2 instance with 2 cpus and 8gb memory

TermTracker

`Traceback (most recent call last):
  File "./TermTrackerMod.py", line 79, in <module>
    dict_words_freq = Term.get_text_word_frequency(item_content)
  File "/home/ubuntu/Apps/AIL-framework/bin/packages/Term.py", line 96, in get_text_word_frequency
    words_dict[word] += 1
  File "./TermTrackerMod.py", line 34, in timeout_handler
    raise TimeoutException
__main__.TimeoutException

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "./TermTrackerMod.py", line 81, in <module>
    print ("{0} processing timeout".format(paste.p_rel_path))
NameError: name 'paste' is not defined

`

keys:

`Traceback (most recent call last):
  File "./Keys.py", line 168, in <module>
    paste = Paste.Paste(message)
  File "/home/ubuntu/Apps/AIL-framework/bin/packages/Paste.py", line 79, in __init__
    self.p_mime = magic.from_buffer(self.get_p_content(), mime=True)
  File "/home/ubuntu/Apps/AIL-framework/AILENV/lib/python3.6/site-packages/magic.py", line 148, in from_buffer
    return m.from_buffer(buffer)
  File "/home/ubuntu/Apps/AIL-framework/AILENV/lib/python3.6/site-packages/magic.py", line 82, in from_buffer
    return self._handle509Bug(e)
  File "/home/ubuntu/Apps/AIL-framework/AILENV/lib/python3.6/site-packages/magic.py", line 101, in _handle509Bug
    raise e
  File "/home/ubuntu/Apps/AIL-framework/AILENV/lib/python3.6/site-packages/magic.py", line 80, in from_buffer
    return maybe_decode(magic_buffer(self.cookie, buf))
  File "/home/ubuntu/Apps/AIL-framework/AILENV/lib/python3.6/site-packages/magic.py", line 255, in magic_buffer
    return _magic_buffer(cookie, buf, len(buf))
  File "/home/ubuntu/Apps/AIL-framework/AILENV/lib/python3.6/site-packages/magic.py", line 188, in errorcheck_null
    raise MagicException(err)
magic.MagicException: b'cannot allocate 172513878 bytes (Cannot allocate memory)'
`