Closed GoingMyWay closed 6 years ago
Hey,
Can you please try the latest version of Prince (0.3.0) with copy=False
? It should be more efficient.
Regards.
I'm closing this, but feel free to reopen it if it's still an issue. The MCA class now uses sparse diagonalization so it shouldn't be an issue anymore.
Same error, Hello I run this code data shape (645000, 2) I got this error using jupyter notebook
import prince
mca = prince.MCA(n_components=2, engine='sklearn', copy=False, n_iter=3)
mca = mca.fit(data_cat)
mca = mca.transform(data_cat)
Error
---------------------------------------------------------------------------
MemoryError Traceback (most recent call last)
<ipython-input-68-529e888de2c5> in <module>
1 import prince
2 mca = prince.MCA(n_components=2, engine='sklearn', copy=False, n_iter=3)
----> 3 mca = mca.fit(data_cat)
4 mca = mca.transform(data_cat)
~\AppData\Local\Continuum\anaconda3\lib\site-packages\prince\mca.py in fit(self, X, y)
25
26 # Apply CA to the indicator matrix
---> 27 super().fit(one_hot)
28
29 # Compute the total inertia
~\AppData\Local\Continuum\anaconda3\lib\site-packages\prince\ca.py in fit(self, X, y)
43
44 # Compute the correspondence matrix which contains the relative frequencies
---> 45 X = X / np.sum(X)
46
47 # Compute row and column masses
MemoryError: Unable to allocate 4.40 GiB for an array with shape (680558, 867) and data type float64
What's the problem?
Not too sure what's going on there @abdoulsn. Would it be possible to access your dataset?
No sorry, which information do you need?
Well I need a minimum working example to reproduce the error. It would be helpful if you could generate a toy dataset with the same characteristics as yours and reproduce the error.
Cardinality of columns are ('reseau', 146), ('cdapet', 721)
, no missing values and I've used copy=False
Something like this
> reseau cdapet
> 0 XX 7010Z
> 1 YY 2030Z
> 2 YY 4674B
> 3 XZ 6820B
> 4 YY_XX 6820A
> ... ... ...
> 680553 XX 6832A
> 680554 YY 4120A
> 680555 XX_WX 7820Z
> 680556 YZ 4941A
> 680557 WX 4669A
Ok I just tried it on my laptop and didn't get any issue. It might be because I have more RAM (16GB) than you do. However, the line of code that raised your MemoryException
is clearly not optimal because it allocates a new array instead of modifying X
inplace. I have therefore changed it to X /= np.sum(X)
.
Let me clean my notebook memory. Thanks
It's ok after restarting my notebook.
Cool glad to hear it.
Hi everyone, i try to make a mca on a dataset of 62649 rows x 4 columns I got the same problem that abdoulsn and use as well Jupyter note book and my computer got 16384MB en RAM. I received this error message: "MemoryError: Unable to allocate 3.40 GiB for an array with shape (58264, 62649) and data type uint8"
Can you help please ? Thank you in advance
This my code below:
Cust_no Risk_Rating Date _Nb_day
0 ARAR64757686100 High 1989-07-14 9.0 1 SHDH64757636547 Low 1978-06-28 23.0 2 AYZY33546757585 Medium 1999-09-15 44.0 3 QISS46575859494 Medium 2000-02-18 61.0 4 SODJ24253673838 high 2001-07-22 50.0 ... ... ... ... ... 62644 DGDT28387374645 Medium 2002-10-03 61.0 62645 ARZU36464748484 High 1993-03-06 232.0 62646 ZRRF16263636353 High 1950-02-13 356.0 62647 ERER14253536373 High 1992-05-30 224.0 62648 ETRF53536353536 Medium 2002-10-14 984.0
mca = prince.MCA( n_components=3,n_iter=3, copy=False, engine='sklearn' )
MemoryError Traceback (most recent call last)
The memory of my machine has 120 GB, and there are 40 GB left for me to conduct MCA computation.
The DataFrame has a shape of
(1244210, 37)
, and I have processed the DataFrame withget_dummy()
function in Pandas.And I want to get 10 components, however, I got MemoryError here
And there are 40GB memories left for me and I can apply PCA to the DataFrame. How can I solve it?
I found a similar issue on this problem: https://github.com/esafak/mca/issues/15