afilipovich / gglsbl

Python client library for Google Safe Browsing API
Apache License 2.0
82 stars 37 forks source link

Cant update DB process is Killed prematurely #33

Closed abshkd closed 6 years ago

abshkd commented 6 years ago

Python 2.7 on Ubuntu 16 Server

 gglsbl_client.py --api-key 'API_KEY' --onetime
2018-02-20 10:23:08,495 WARNING Circumventing request frequency throttling is against Safe Browsing API policy.
2018-02-20 10:23:08,649 INFO Opening SQLite DB /tmp/gsb_v4.db
2018-02-20 10:23:08,862 INFO Cleaning up full_hash entries expired more than 43200 seconds ago.
2018-02-20 10:23:22,160 INFO Storing 40807 entries of hash prefix list MALWARE/IOS/URL
2018-02-20 10:23:22,838 INFO Local cache checksum matches the server: 95ac5bd2f9429a9653d14ff5cb8673ebcd2f9c287c34cf7e5aaa0814d9e4e132
2018-02-20 10:23:24,934 INFO Storing 468600 entries of hash prefix list MALWARE/OSX/URL
2018-02-20 10:23:35,361 INFO Local cache checksum matches the server: 3876c9ae6517164006c72ef3d22fd2655f3f532a70df49447f3953a8ddbf3385
2018-02-20 10:23:37,149 INFO Storing 1090026 entries of hash prefix list SOCIAL_ENGINEERING/OSX/URL
Killed

The process is repeatable and fails exactly here. I am using the master branch.

abshkd commented 6 years ago

A bit more info troubleshooting info. If I reduce my general OS memory usage to bare minimal I can proceed further. I do believe this problem can be resolved by chunking. It will still fail if the available memory exceeds (about 512MB free in this case) then the process is Killed. If i can dive deeper into the code I will submit a patch.

afilipovich commented 6 years ago

Hi @abshkd, do you mean sync fails when there is more than 512Mb memory available but works if memory is tight? Is there anything unusual in dmesg output after script gets killed?

abshkd commented 6 years ago

Hi, thanks for responding so quickly. I meant exactly the opposite. When memory is low, the sync will fail with larger threat lists. I forgot to check my dmesg I will try to reproduce on another system and get you a dmesg. I believe between retrieving the hash list and then saving to sqlite, the process itself hits the OS limit and is killed. Will confirm later today.

On Tue, Feb 20, 2018 at 8:52 PM, Aleh Filipovich notifications@github.com wrote:

Hi @abshkd https://github.com/abshkd, do you mean sync fails when there is more than 512Mb memory available but works if memory is tight? Is there anything unusual in dmesg output after script gets killed?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/afilipovich/gglsbl/issues/33#issuecomment-366968145, or mute the thread https://github.com/notifications/unsubscribe-auth/AAvFHG9s7y7l95qwWJIfIVa_RZ5YU8XTks5tWsAdgaJpZM4SLuEt .

afilipovich commented 6 years ago

In that case it must be OOM killer. The library requires more than 500Mb to sync. Optimizing for memory usage will most likely increase CPU usage which is a hard bottleneck as sync processes is single-threaded, while memory is relatively cheap (in fact for improved performance I keep SQLite file on RAMFS partition).

abshkd commented 6 years ago

After I shutdown a few processes on the VM I was able to complete the initial sync. I suppose now its only differential updates so I will give it another go with the normal processes running plus the update. thank you again. It works very well otherwise.

On Wed, Feb 28, 2018 at 2:32 AM, Aleh Filipovich notifications@github.com wrote:

In that case it must be OOM killer. The library requires more than 500Mb to sync. Optimizing for memory usage will most likely increase CPU usage which is a hard bottleneck as sync processes is single-threaded, while memory is relatively cheap (in fact for improved performance I keep SQLite file on RAMFS partition).

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/afilipovich/gglsbl/issues/33#issuecomment-368979463, or mute the thread https://github.com/notifications/unsubscribe-auth/AAvFHEuz3aePm-DVZSaS5kHMzQ4sywgTks5tZEpAgaJpZM4SLuEt .

afilipovich commented 6 years ago

Ok, great, thank you for the feedback!