commonsense / conceptnet5

Code for building ConceptNet from raw data.
Other
2.78k stars 355 forks source link

optimize postgresql queries and psycopg2 codes #303

Closed gsittyz closed 3 years ago

gsittyz commented 3 years ago

postgresql queries

The current queries require substantial amount of time to create a database. I tried to build the conceptnet5 in my environment, but the process stuck for 24+ hours and I finally gave up building it.

PostgreSQL: Documentation: 12: 14.4. Populating a Database provides some tips for importing a large amount of data.

My edits are:

With this code, the build finished in 2 hours in my environment (I also changed database settings: wal_level to minimal, archive_mode to off, and max_wal_senders to zero).

psycopg2 codes

The connection class — Psycopg 2.8.7.dev0 documentation

Connections can be used as context managers. Note that a context wraps a transaction: if the context exits with success the transaction is committed, if it exits with an exception the transaction is rolled back. Note that the connection is not closed by the context and it can be used for several contexts.

I'm not sure why connection.commit() is included in the source code with conn.autocommit enabled.... This is another problem, though.

rspeer commented 3 years ago

This looks great!

I'd like to test it before merging it in, and I have yet to set up a ConceptNet build environment that isn't my old work computer. This is on my priority list, but feel free to bump this PR if I forget.

rspeer commented 3 years ago

I'm almost in a position to look at this again -- but I still don't have access to a dev computer with enough disk space to actually try it out.

rspeer commented 3 years ago

I'm sorry this took so long, but I finally was able to run the build again, and this PR certainly made it much faster. Thank you!