Closed lokeshh closed 6 years ago
So, looking at this, I at first thought it was interesting, but now I'm not so sure.
I believe that the Ruby Matrix class stores the data as an array. So when you call #to_a
, you're basically just getting a copy of the internal storage.
NMatrix, however, has to make a new Array — first allocating and then copying values into it.
I guess the question is this: what are you doing that is requiring you to call #to_a
? Can you avoid it? By its nature, it's going to involve a full copy operation no matter what.
Before we label this bug or enhancement, please have a look through the relevant code and figure out if we're using best practices already or not. It may be there's a simple mistake, or it may be that there are actual enhancements that can be made, or it may already be optimal.
I guess the question is this: what are you doing that is requiring you to call #to_a? Can you avoid it? By its nature, it's going to involve a full copy operation no matter what.
Python data science libraries encourage converting numpy arrays to a list after vectorized computations and linear algebra functions are no longer required for massive data sets, because indexing and enumerating a numpy array that is very large is much slower than if it were a simple list, I'm not sure if this is Lokesh's use case but I've seen this a lot while I was doing a course on Information Retrieval using scikit-learn
, also the tradeoff costs are not clear in our case either.
While porting Statsample-GLM form Matrix to NMatrix I found
to_a
is taking a lot of time than it should and in fact it is as the following benchmark shows:Improvement in
to_a
might solve the issue in https://github.com/SciRuby/statsample-glm/pull/36