AI-sandbox / gnomix

A fast, scalable, and accurate local ancestry method.
Other
87 stars 15 forks source link

error of "struct.error: 'i' format requires -2147483648 <= number <= 2147483647" #42

Open buyske opened 1 year ago

buyske commented 1 year ago

I am getting the error below for most, but not all chromosomes. I am fitting my own model from scratch, using a subset of samples from the 1K Genomes 30x as a reference panel, with the variants reduced to match those in my array-based query file.

I have two different query files, on different arrays, and they both throw this error on the same chromosomes, so I suppose the error is coming about from the reference panel. I've looked into the solution mentioned in issue #40; it wasn't the problem. I'm currently trying the reference panel without dropping any variants, but that runs painfully slowly so it may be a while before I know.

Thanks for any suggestions. Steve

Reading data... Building model... Training base models... 100%|████████████████████████████████████████| 660/660 [10:10:54<00:00, 55.54s/it]
Training smoother... 100%|████████████████████████████████████████| 660/660 [02:38<00:00, 4.17it/s]

Traceback (most recent call last): File "gnomix/gnomix.py", line 397, in model = train_model(config, data_path, verbose=verbose) File "gnomix/gnomix.py", line 195, in train_model model.train(data=data, retrain_base=retrain_base, evaluate=True, verbose=verbose) File "gnomix/src/model.py", line 116, in train B_t2 = self.base.predict_proba(X_t2) File "gnomix/src/Base/base.py", line 139, in predict_proba return self.predict_proba_vectorized(X) File "gnomix/src/Base/base.py", line 172, in predict_proba_vectorized B = np.array(pool.starmap(self.predict_proba_base_model, base_args)) File "/usr/local/Cellar/python@3.7/3.7.16/Frameworks/Python.framework/Versions/3.7/lib/python3.7/multiprocessing/pool.py", line 276, in starmap return self._map_async(func, iterable, starmapstar, chunksize).get() File "/usr/local/Cellar/python@3.7/3.7.16/Frameworks/Python.framework/Versions/3.7/lib/python3.7/multiprocessing/pool.py", line 657, in get raise self._value File "/usr/local/Cellar/python@3.7/3.7.16/Frameworks/Python.framework/Versions/3.7/lib/python3.7/multiprocessing/pool.py", line 431, in _handle_tasks put(task) File "/usr/local/Cellar/python@3.7/3.7.16/Frameworks/Python.framework/Versions/3.7/lib/python3.7/multiprocessing/connection.py", line 206, in send self._send_bytes(_ForkingPickler.dumps(obj)) File "/usr/local/Cellar/python@3.7/3.7.16/Frameworks/Python.framework/Versions/3.7/lib/python3.7/multiprocessing/connection.py", line 393, in _send_bytes header = struct.pack("!i", n) struct.error: 'i' format requires -2147483648 <= number <= 2147483647

EfraMP commented 9 months ago

I'm facing the exact same error. Did you find any solution??

jamesfifer commented 6 months ago

Since the error is chromosome specific, are you getting this error on your largest chromosome? This error typically occurs when trying to pack or unpack integers that are outside the range supported by the 'i' format.

Ive gotten around this by splitting my problem chromosome into two smaller chunks.

If thats not a solution for you you might try using a 64 bit integer (currently gnomix is using a 32-bit integer) which would give you a larger range for your positions.