Closed Didou09 closed 3 years ago
Hi, very much appreciate the updates of the docstrings. Thanks a lot for that. However, the [None,:] stuff is not the same as reshape. It adds another dimension and leads to different results: ` In [1]: import pymrio
In [2]: import numpy as np
In [3]: t = pymrio.load_test().calc_all()
In [4]: x = t.x.values
In [5]: A = t.A.values
In [6]: Zorig = A.dot(np.diagflat(x))
In [7]: Zreshape = A*x.reshape((1,-1))
In [8]: Znone = A*x[None, :]
In [9]: np.allclose(t.Z.values, Zorig) Out[9]: True
In [10]: np.allclose(t.Z.values, Zreshape) Out[10]: True
In [11]: np.allclose(t.Z.values, Znone) Out[11]: False
`
Hi,
Hum, looks like there has been a semantic misunderstanding:
I suggest we explicitly mention the shape of the array to avoid mis-comprehensions about what we mean by 'vector'.
The syntax x[None, :]
(respectively x[:, None]
) is a fast numpy broadcasting syntax typically used on a 1D array of shape (N,), to add a dimension and make it a 2D array of shape (1, N), respectively (N, 1), to be used as a calculation intermediate, like in the operation A*x
.
From your example, I understand that x was already a 2D array (of shape (N, 1)).
So in that case indeed you do not need to add a dimension, just to exchange the dimensions.
This is what you do by usingx.reshape((1, -1))
.
But in that case, this operation amounts to computing the transpose of x.
Then I would suggest using directly the transpose operator : x.T
It is ~x2 faster on large arrays.
In [13]: x = np.random.random((100000,1))
In [14]: %timeit y = x.T
315 ns ± 27.6 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
In [15]: %timeit y = x.reshape((1, -1))
606 ns ± 11 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
In [16]: np.allclose(x.T, x.reshape((1, -1)))
Out[16]: True
Maybe, when dealing with arrays like x that have a dimension equals to unity (shape (1, N) or (N, 1)), it might be better practice to just store it as a 1D array of shape (N,) (to avoid dimensions that do not carry information) and just use broadcasting when necessary. Your choice :-)
The point of this is to give people the possibility to provide the x "vector" (and other stuff) in various formats. It is not only to use the x (or y in that regard) coming from pymrio, but also to provide a different x or to use the io_math functions independently of pymrio. Kind of duck-typing for the vector.
I updated the tests (test_math) to explicitly test these things (hash daed302).
More general: this pull request now got very mixed up. I would suggest to separate the two issues. I am quite happy to have the docstrings updates. I could also just cherry pick them. For the code changes, the best way would be to first open an issue to discuss changes.
OK, will split into 2 separate issues and PRs tomorrow, for better readability :-)
thank you
Motivations:
Main changes:
If you fell this PR is relevant, I will continue to suggest such minor changes as I go through the code. Otherwise I won't
(but feel free to keep only a part of the changes and tell why you dismissed the others so I can do more focused suggestions next time)