Minor suggestions in docstr

Didou09 commented 3 years ago

Motivations:

As I read through the code to better understand how it works, I figured I would include small comments or reminders that I find helpful in some docstrings, to complement the documentation for beginner users like me, feel free to dismiss if considered not useful
Also, when I see a syntax that I feel could be made either more self-explaining or more numerically efficient, I suggest a minor change in that way. Also feel free to dismiss if not useful

Main changes:

Minor changes in tools/iomath.py where the calculation functions are for missing A, Z, x, L,...
Syntax: mostly suggesting [None, :] (faster and more explicit) instead of np.replace()

If you fell this PR is relevant, I will continue to suggest such minor changes as I go through the code. Otherwise I won't

(but feel free to keep only a part of the changes and tell why you dismissed the others so I can do more focused suggestions next time)

konstantinstadler commented 3 years ago

Hi, very much appreciate the updates of the docstrings. Thanks a lot for that. However, the [None,:] stuff is not the same as reshape. It adds another dimension and leads to different results: ` In [1]: import pymrio

In [2]: import numpy as np

In [3]: t = pymrio.load_test().calc_all()

In [4]: x = t.x.values

In [5]: A = t.A.values

In [6]: Zorig = A.dot(np.diagflat(x))

In [7]: Zreshape = A*x.reshape((1,-1))

In [8]: Znone = A*x[None, :]

In [9]: np.allclose(t.Z.values, Zorig) Out[9]: True

In [10]: np.allclose(t.Z.values, Zreshape) Out[10]: True

In [11]: np.allclose(t.Z.values, Znone) Out[11]: False

`

Didou09 commented 3 years ago

Hi,

Hum, looks like there has been a semantic misunderstanding:

I was assuming, wrongly, that by 'vector' you meant a 1d array (i.e.: an array with only one dimension: shape (N,))
I now realize you mean a 2D array with shape (1, N) for a 'row vector', or (N, 1) for a 'column vector'

I suggest we explicitly mention the shape of the array to avoid mis-comprehensions about what we mean by 'vector'.

The syntax x[None, :] (respectively x[:, None]) is a fast numpy broadcasting syntax typically used on a 1D array of shape (N,), to add a dimension and make it a 2D array of shape (1, N), respectively (N, 1), to be used as a calculation intermediate, like in the operation A*x.

From your example, I understand that x was already a 2D array (of shape (N, 1)). So in that case indeed you do not need to add a dimension, just to exchange the dimensions. This is what you do by usingx.reshape((1, -1)).

But in that case, this operation amounts to computing the transpose of x. Then I would suggest using directly the transpose operator : x.T It is ~x2 faster on large arrays.

In [13]: x = np.random.random((100000,1))                                                                                                                                               

In [14]: %timeit y = x.T                                                                                                                                                                
315 ns ± 27.6 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

In [15]: %timeit y = x.reshape((1, -1))                                                                                                                                                 
606 ns ± 11 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

In [16]: np.allclose(x.T, x.reshape((1, -1)))                                                                                                                                           
Out[16]: True

Suggestion:

Maybe, when dealing with arrays like x that have a dimension equals to unity (shape (1, N) or (N, 1)), it might be better practice to just store it as a 1D array of shape (N,) (to avoid dimensions that do not carry information) and just use broadcasting when necessary. Your choice :-)

konstantinstadler commented 3 years ago

The point of this is to give people the possibility to provide the x "vector" (and other stuff) in various formats. It is not only to use the x (or y in that regard) coming from pymrio, but also to provide a different x or to use the io_math functions independently of pymrio. Kind of duck-typing for the vector.

I updated the tests (test_math) to explicitly test these things (hash daed302).

More general: this pull request now got very mixed up. I would suggest to separate the two issues. I am quite happy to have the docstrings updates. I could also just cherry pick them. For the code changes, the best way would be to first open an issue to discuss changes.

Didou09 commented 3 years ago

OK, will split into 2 separate issues and PRs tomorrow, for better readability :-)

coveralls commented 3 years ago

Pull Request Test Coverage Report for Build 216

0 of 0 changed or added relevant lines in 0 files are covered.
No unchanged relevant lines lost coverage.
Overall coverage remained the same at 89.505%

Totals
Change from base Build 196:	0.0%
Covered Lines:	1791
Relevant Lines:	2001

💛 - Coveralls

konstantinstadler commented 3 years ago

thank you

IndEcol / pymrio