CIRCL / AIL-framework

AIL framework - Analysis Information Leak framework. Project moved to https://github.com/ail-project
https://github.com/ail-project/ail-framework
GNU Affero General Public License v3.0
1.29k stars 283 forks source link

Feeding maxing out at 150 pastes per minute when importing #477

Closed src7 closed 5 months ago

src7 commented 4 years ago

Hello,

why can't we feed more than 150 pastes per minute ? It doesn't look like a hardware bottleneck.

v1psta commented 4 years ago

I am feeding more then 150 pastes a minute....

On Sun, Feb 23, 2020 at 5:41 PM src7 notifications@github.com wrote:

Hello,

why can not we feed more than 150 pastes per minute ? It does not look like a hardware bottleneck.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/CIRCL/AIL-framework/issues/477?email_source=notifications&email_token=AAZXPEOPN6IUMVOFG6R5JDTREL3QVA5CNFSM4KZ6OZ52YY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4IPS7GQQ, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAZXPENM3IDRQXIB2HM5TELREL3QVANCNFSM4KZ6OZ5Q .

src7 commented 4 years ago

Thank you, we tried on different servers and Processed pastes is stuck at about 150 (on one single feeder, using bin/import_dir.py)

v1psta commented 4 years ago

I will reach out to my team, and see if they had to make any special changes.

On Sun, Feb 23, 2020 at 5:49 PM src7 notifications@github.com wrote:

Thank you, we tried on different servers and Processed pastes is stuck at about 150 (on one single feeder, using bin/import_dir.py)

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/CIRCL/AIL-framework/issues/477?email_source=notifications&email_token=AAZXPEJVSRMAAO6NITYCNH3REL4PLA5CNFSM4KZ6OZ52YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEMWJWMY#issuecomment-590125875, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAZXPEIGQPGO2SYMLR5QNULREL4PLANCNFSM4KZ6OZ5Q .

src7 commented 4 years ago

Thank you, currently I am trying something with configs/6382.conf

Terrtia commented 4 years ago

Hi @src7 ! Do you have any stuck queues ? Are you processing huge files ?

It might be a disk issue. We create the Global module to save all items on disk. Can you try to launch multiple Global modules ?:

Screen -r Script_AIL

Crtl+a c

. ./AILENV/bin/activate
cd bin
./Global.py
src7 commented 4 years ago

Hi,

Queues are not empty, but not stuck. Files processed are regular pastes from pastebin.

The disk is NVME or RAM based (direct I/O disabled for the last one).

I will try these commands.

The weird thing is that it is the same constant rate all the time. Could it be one sleep somewhere ? (didn't found it yet)

We compile the last version of AIL directly from the master branch on a fresh install of Ubuntu Bionic.

src7 commented 4 years ago

Imported pastes are not compressed (if it can help)

mokaddem commented 4 years ago

@src7 Are you using the import_dir.py script by any chance? If yes, you might want to change the sleep time to something smaller. Look at this: https://github.com/CIRCL/AIL-framework/blob/5ae22ec2168a55e89279228de7c4cdbbd36baa44/bin/import_dir.py#L114

src7 commented 4 years ago

Yes, and that is not the problem. I see way more than 150 pastes per minute imported in the console (no sleep set).

mokaddem commented 4 years ago
  1. In which console do you see that more than 150? Global, Mixer or import_dir? By defaultargs.seconds`` is set to 0.2s
  2. And where do you see the 150 pastes per minute?
src7 commented 4 years ago
  1. import_dir (also tested with different sleep values and even without the sleep line)
  2. In the Web Dashboard, Processed pastes
src7 commented 4 years ago

After some research I have found that tuning this delay changes everything :

https://github.com/CIRCL/AIL-framework/blob/b4a85c0e9808c1660adc36400e33238121099f4d/bin/Helper.py#L105

Currently reaching 4000 pastes per minute and working on more.

Terrtia commented 4 years ago

Good catch ! I did some modification on the ZMQ feeders. 998f8cc8e15f81dff5a4d006509d5c58883da629 The feeder use ZMQ Poller for more general non-blocking I/O.

You should reach more than 4000 pastes per minute