Open bernhardreiter opened 3 years ago
Running the ripe importer uses quite a bit of memory (~8GB as of today). Can this be reduced?
Downloading the ripe files for 2021-02-12 gives us the raw size we want to process:
2021-02-12> gzip -l * gzip: delegated-ripencc-latest: not in gzip format compressed uncompressed ratio uncompressed_name 8206950 78779140 89.6% ripe.db.aut-num 25903102 507574226 94.9% ripe.db.inet6num 242928983 3628042984 93.3% ripe.db.inetnum 5843624 95550239 93.9% ripe.db.organisation 4413327 77870566 94.3% ripe.db.role 287295986 4387817155 93.5% (totals)
so 4.3 GB of uncompressed data uncompressed data.
Using https://pypi.org/project/memory-profiler/
# Debian Buster apt-get install python3-memory-profiler python3-matplotlib
Decorating a few functions, where the memory consumption is:
--- a/intelmq_certbund_contact/ripe/ripe_data.py +++ b/intelmq_certbund_contact/ripe/ripe_data.py @@ -78,2 +78,3 @@ def add_common_args(parser): +@profile def load_ripe_files(options) -> tuple: @@ -205,2 +206,3 @@ def read_asn_whitelist(filename, verbose=False): +@profile def parse_file(filename, fields, index_field=None, restriction=lambda x: True, @@ -298,2 +300,3 @@ def parse_file(filename, fields, index_field=None, restriction=lambda x: True, +@profile def build_index(obj_list, index_attribute): @@ -441,2 +444,3 @@ def split_for_known_orgs(obj_list, organisation_index): +@profile def sanitize_split_and_modify(obj_list, index, whitelist, @@ -501,2 +505,3 @@ def sanitize_split_and_modify(obj_list, index, whitelist, +@profile def convert_inetnum_to_networks(inetnum_list): @@ -510,2 +515,3 @@ def convert_inetnum_to_networks(inetnum_list): +@profile def convert_inet6num_to_networks(inet6num_list): @@ -517,2 +523,3 @@ def convert_inet6num_to_networks(inet6num_list): +@profile def process_inetnum_contacts(key, inet_list, inet_list_u, restrict_country):
We can get a plot, trying to import with a country restriction of NO:
env PYTHONPATH=/home/bern/dev/certbund-contact-git: python3-mprof run /home/bern/dev/certbund-contact-git/intelmq_certbund_contact/ripe/ripe_import.py -v --restrict-to-country NO --conninfo 'host=localhost port=5432 dbname=contactdb' python3-mprof plot -t "ripe_importer memory profile 2021-12-02"
Here is the data file for interactive browsing (rename to remove the .txt suffix): mprofile_20210212110015.dat.txt
.txt
Running the ripe importer uses quite a bit of memory (~8GB as of today). Can this be reduced?
Analysis
Downloading the ripe files for 2021-02-12 gives us the raw size we want to process:
so 4.3 GB of uncompressed data uncompressed data.
Using https://pypi.org/project/memory-profiler/
Decorating a few functions, where the memory consumption is:
We can get a plot, trying to import with a country restriction of NO:
Here is the data file for interactive browsing (rename to remove the
.txt
suffix): mprofile_20210212110015.dat.txt