AKST / Australian-Address-Boundaries-Land-Property-Price-Database

This is a database of geographic boundaries, addresses as well as land and property data (mostly NSW).
MIT License
1 stars 0 forks source link

Refactor property sales to use multiprocessing #30

Closed AKST closed 1 month ago

AKST commented 1 month ago

It looks like it'll take 16 hours to load everything into postgres.

Now just parsing the text and logging the rows (30,000,000+ rows) takes like 30 minutes, at the moment I'm ingesting the data into postgres which does the same thing excepts batches stuff for postgres to ingestion. I'm at 4hrs and 34m and I've only ingested 5'400'000+ rows, that's a lot but that's like 1/6 of the total rows.

I think it's time we look at an inter process communication approach. I guess my distributed system course is coming in handy for once.

Is this an issue on the postgres end?

I don't think so? I looked at the usage and it looks like it's barely even at capacity?

image
AKST commented 1 month ago

bruh moment when multiprocessing

image
AKST commented 1 month ago

Okay yeah I got it working, what took 9 hours to process earlier took less than a minute, I'm shocked tbh.

AKST commented 1 month ago

This is sick. And now I should be able to iterate a lot faster.