Closed chansooligans closed 1 year ago
need to solve this first: https://github.com/chansooligans/oagdedupe/issues/120
(to build inverted index in separate function, it will need to store it; creating a table in parallel is okay but need to make sure there are no conflicts (two processes trying to create same table); this should not happen with proper caching, but that's not the case with current implementation of multiprocessing)
i can just add to abstract repo even if postgres simply returns query strings for "build_inverted_index"
As an example of making business logic clearer, see
get_inverted_index_stats()
function below... it would be nice to separate this into "build inverted index" and "get inverted index stats" in the logic and not just in the repository. A problem is thatget_inverted_index_stats()
is a single query (it uses temp tables to build inverted indices then obtains comparison pairs, then computes stats using these pairs).I can fix this by splitting into two functions. One function builds the inverted index and saves in its own table instead of a temp table. And a second function computes the stats.
Can I do insert statements in parallel?