ahgamut / superconfigure

wrap autotools configure scripts to build with Cosmopolitan Libc
The Unlicense
159 stars 22 forks source link

Multiprocessing errors out #2

Closed ingenieroariel closed 10 months ago

ingenieroariel commented 10 months ago

I hit https://bugs.python.org/issue3770

Here is the code:

import glob
import sqlite3
from multiprocessing import Pool, cpu_count

def process_files(filenames):
    conn = sqlite3.connect('buildings_{}.sqlite'.format(filenames[0]))  # Unique DB for each process
    cursor = conn.cursor()
    cursor.execute('''CREATE TABLE buildings (h3_code INTEGER)''')

    for filename in filenames:
        proc = subprocess.run(["./latLngToCell", filename], capture_output=True)
        codes = [int(code, 16) for code in proc.stdout.decode().split()]

        batch_codes = [(code,) for code in codes]
        cursor.executemany("INSERT INTO buildings (h3_code) VALUES (?)", batch_codes)
        conn.commit()

    conn.close()

def main():
    pattern = "points_s2_level_4_gzip/*.csv"
    # pattern = "small/*.csv"
    # pattern = "medium/*.csv"

    filenames = glob.glob(pattern)
    num_processes = 16 # cpu_count()  # Number of CPU cores
    chunk_size = len(filenames) // num_processes

    # Split filenames into batches for each process
    filename_batches = [filenames[i:i+chunk_size] for i in range(0, len(filenames), chunk_size)]

    with Pool(processes=num_processes) as pool:
        pool.map(process_files, filename_batches)

    print("Inserted all buildings")

main()
~/datasette.com process5.py
Traceback (most recent call last):
  File "Lib/multiprocessing/synchronize.py", line 28, in <module>
ImportError: cannot import name 'SemLock' from '_multiprocessing' (unknown location)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/x/src/buildings/process5.py", line 37, in <module>
    main()
  File "/home/x/src/buildings/process5.py", line 32, in main
    with Pool(processes=num_processes) as pool:
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "Lib/multiprocessing/context.py", line 119, in Pool
  File "Lib/multiprocessing/pool.py", line 191, in __init__
  File "Lib/multiprocessing/pool.py", line 346, in _setup_queues
  File "Lib/multiprocessing/context.py", line 113, in SimpleQueue
  File "Lib/multiprocessing/queues.py", line 341, in __init__
  File "Lib/multiprocessing/context.py", line 67, in Lock
  File "Lib/multiprocessing/synchronize.py", line 30, in <module>
ImportError: This platform lacks a functioning sem_open implementation, therefore, the required synchronization primitives needed will not function, see issue 3770.
ahgamut commented 10 months ago

ImportError: cannot import name 'SemLock' from '_multiprocessing' (unknown location)

this error is probably due to CPython's ./configure script not detecting Cosmopolitan Libc's sem_open implementation properly? I have started a rebuild, will check if the new binaries have the same issue.

ahgamut commented 10 months ago

confirmed: the updated datasette.com binary does not error out attempting to import _multiprocessing.SemLock:

image

ingenieroariel commented 10 months ago

Fixed for me!