IndEcol / pymrio

Multi-Regional Input-Output Analysis in Python.
http://pymrio.readthedocs.io/en/latest/
Other
155 stars 71 forks source link

Minor suggestions in docstr #53

Closed Didou09 closed 3 years ago

Didou09 commented 3 years ago

Motivations:

Main changes:


If you fell this PR is relevant, I will continue to suggest such minor changes as I go through the code. Otherwise I won't

(but feel free to keep only a part of the changes and tell why you dismissed the others so I can do more focused suggestions next time)

konstantinstadler commented 3 years ago

Hi, very much appreciate the updates of the docstrings. Thanks a lot for that. However, the [None,:] stuff is not the same as reshape. It adds another dimension and leads to different results: ` In [1]: import pymrio

In [2]: import numpy as np

In [3]: t = pymrio.load_test().calc_all()

In [4]: x = t.x.values

In [5]: A = t.A.values

In [6]: Zorig = A.dot(np.diagflat(x))

In [7]: Zreshape = A*x.reshape((1,-1))

In [8]: Znone = A*x[None, :]

In [9]: np.allclose(t.Z.values, Zorig) Out[9]: True

In [10]: np.allclose(t.Z.values, Zreshape) Out[10]: True

In [11]: np.allclose(t.Z.values, Znone) Out[11]: False

`

Didou09 commented 3 years ago

Hi,

Hum, looks like there has been a semantic misunderstanding:

I suggest we explicitly mention the shape of the array to avoid mis-comprehensions about what we mean by 'vector'.

The syntax x[None, :] (respectively x[:, None]) is a fast numpy broadcasting syntax typically used on a 1D array of shape (N,), to add a dimension and make it a 2D array of shape (1, N), respectively (N, 1), to be used as a calculation intermediate, like in the operation A*x.

From your example, I understand that x was already a 2D array (of shape (N, 1)). So in that case indeed you do not need to add a dimension, just to exchange the dimensions. This is what you do by usingx.reshape((1, -1)).

But in that case, this operation amounts to computing the transpose of x. Then I would suggest using directly the transpose operator : x.T It is ~x2 faster on large arrays.

In [13]: x = np.random.random((100000,1))                                                                                                                                               

In [14]: %timeit y = x.T                                                                                                                                                                
315 ns ± 27.6 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

In [15]: %timeit y = x.reshape((1, -1))                                                                                                                                                 
606 ns ± 11 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

In [16]: np.allclose(x.T, x.reshape((1, -1)))                                                                                                                                           
Out[16]: True

Suggestion:

Maybe, when dealing with arrays like x that have a dimension equals to unity (shape (1, N) or (N, 1)), it might be better practice to just store it as a 1D array of shape (N,) (to avoid dimensions that do not carry information) and just use broadcasting when necessary. Your choice :-)

konstantinstadler commented 3 years ago

The point of this is to give people the possibility to provide the x "vector" (and other stuff) in various formats. It is not only to use the x (or y in that regard) coming from pymrio, but also to provide a different x or to use the io_math functions independently of pymrio. Kind of duck-typing for the vector.

I updated the tests (test_math) to explicitly test these things (hash daed302).

More general: this pull request now got very mixed up. I would suggest to separate the two issues. I am quite happy to have the docstrings updates. I could also just cherry pick them. For the code changes, the best way would be to first open an issue to discuss changes.

Didou09 commented 3 years ago

OK, will split into 2 separate issues and PRs tomorrow, for better readability :-)

coveralls commented 3 years ago

Pull Request Test Coverage Report for Build 216


Totals Coverage Status
Change from base Build 196: 0.0%
Covered Lines: 1791
Relevant Lines: 2001

💛 - Coveralls
konstantinstadler commented 3 years ago

thank you