Closed dfalbel closed 6 years ago
Hey there - that's correct. The reason for that boils down performance: the big issue is that indexing into a sparse matrix requires python to create a new sparse matrix, which over the course of several trees added around 40-50% more time to train (creating them is expensive). I originally tried to directly index into the matrices, but that caused big memory issues when training with multiple threads.
Let me know if you have any issues!
Alright! Thanks for the fast answer! It's good to know that python creates a new matrix wen indexing, never thought about that! I'll let you know if have any more issues!
Em qui, 14 de dez de 2017 às 21:35, Andrew Stanton notifications@github.com escreveu:
Hey there - that's correct. The reason for that boils down performance: the big issue is that indexing into a sparse matrix requires python to create a new sparse matrix, which over the course of several trees added around 40-50% more time to train (creating them is expensive). I originally tried to directly index into the matrices, but that caused big memory issues when training with multiple threads.
Let me know if you have any issues!
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/Refefer/fastxml/issues/6#issuecomment-351868737, or mute the thread https://github.com/notifications/unsubscribe-auth/AEfSBpSFBOHJc1536n8hm8rqerCIVQr_ks5tAbCtgaJpZM4RCYGi .
It's pretty annoying how slow the scipy sparse classes are. A good portion of the cython code is implementing optimized code to make using them faster, like computing dot products :)
First of all, thank very much for your work! I still didn't understand how to pass values from python.
I have a
scipy.csr_matrix
with dimensions (3.000.000, 8.000) which I am passing to the fit method. But I get a message:AssertionError: Requires list of csr_matrix
.Do I need to input a list of 3.000.000 elements, each one as acsr matrix?
Thanks