Fix the index ordering issue with dataWindow; correct some mistakes in parameter files.

TedStudley / mc-mini

Simple Stokes+Advection/Diffusion solver.

0 stars 4 forks source link

Fix the index ordering issue with dataWindow; correct some mistakes in parameter files. #29

Closed icherkashin closed 8 years ago

icherkashin commented 8 years ago

I tested these changes with my script that showed second-order convergence, so everything should work correctly with these changes.

TedStudley commented 8 years ago

Looks good and the dense forms test passes (although I'm not certain that test even uses DataWindow...). Would you mind writing a unit test for DataWindow to ensure that the (i, j) coordinates in DataWindow refer to the correct underlying coordinate? You can check out the file test/dense_forms_test.cpp to see how the GTest library can be used for unit testing.

@egpuckett, how do these changes look to you? You're more likely to know whether or not everything is correct.

icherkashin commented 8 years ago

Would you mind writing a unit test for DataWindow to ensure that the (i, j) coordinates in DataWindow refer to the correct underlying coordinate?

I wouldn't mind, but I don't see clearly how to test this because (i,j) is simply a human convention, whereas for the computer it makes no difference as long as you don't go out of bounds in the underlying arrays. I'm also not certain what you mean by "underlying coordinate:" you should explain more precisely what you mean.

TedStudley commented 8 years ago

I'm asking for you to test that the convention is correct. This can be done by setting up two structures:

A small 2D array of doubles
A DataWindow wrapper around the array

Then, ensure that changing the (i, j) coordinate of the DataWindow changes the expected coordinate of the underlying array. If you'd like, you can also construct an Eigen::Map around the array and ensure that the expected coordinate of the Map is changed, as well.

We ran into this problem in the first place because we didn't have a solid convention for DataWindow vs. matrix vs. data coordinates, so having a unit test to provide that convention seems to be the best idea, moving forward. Since having the wrong convention was clearly noticeable, it shouldn't be too difficult to write a test that the right convention is upheld.

icherkashin commented 8 years ago

Then, ensure that changing the (i, j) coordinate of the DataWindow changes the expected coordinate of the underlying array

Thank you, now I understand what you meant! In fact, at this point, it wasn't my goal to change the underlying column-major memory layout in the arrays to row-major. For now, I simply changed the interface, leaving the underlying memory layout intact. What you are talking about I will try to do later, once other issues are addressed.

This problem of indices and memory layout turned out to be more complex and pervasive in the code than I originally thought, so I'm addressing it piece by piece. These changes here are simply the first step. It is necessary to do them gradually because it is otherwise too difficult to make sense of the impact of changes in various parts of the code.

So, regarding the unit test, I'm not certain whether it is even necessary at this point (but it may be later) since these changes concern only the interface of DataWindow but the underlying memory structure remained the same.

TedStudley commented 8 years ago

I'm not talking about changing the underlying memory layout at all. In fact, attempting to do so is most likely a bad idea, since I the underlying Eigen matrix operations are most efficient in column-major storage ordering. I did a quick benchmark with GTest with the following:

#define MAT_SIZE 1000
#define REPITITIONS 10000

#include <gtest/gtest.h>
#incude <Eigen/Dense>
class StorageOrderTest : public testing::Test {
protected:
  StorageOrderTest() :
      a((double *) malloc(sizeof(double) * MAT_SIZE * MAT_SIZE)),
      x((double *) malloc(sizeof(double) * MAT_SIZE)) {
    for (int i = 0; i < MAT_SIZE; ++i) {
      x[i] = (i + 1);
      for (int j = 0; j < MAT_SIZE; ++j) {
        a[i * MAT_SIZE + j] = (i + 1) * (j + 1);
      }
    }
  }

  ~StorageOrderTest() {
    free(a);
    free(x);
  }

  double *a;
  double *x;
};

TEST_F(StorageOrderTest, RowMajor) {
  for (int n = 0; n < REPITITIONS; ++n) {
    Eigen::Map<Eigen::VectorXd> v_x(x, MAT_SIZE);
    Eigen::Map<Eigen::Matrix<double, Eigen::Dynamic, Eigen::Dynamic, Eigen::RowMajor> > m_a(a, MAT_SIZE, MAT_SIZE);

    (m_a * v_x).sum();
  }
}

TEST_F(StorageOrderTest, ColMajor) {
  for (int n = 0; n < REPITITIONS; ++n) {
    Eigen::Map<Eigen::VectorXd> v_x(x, MAT_SIZE);
    Eigen::Map<Eigen::MatrixXd> m_a(a, MAT_SIZE, MAT_SIZE);

    (m_a * v_x).sum();
  }
}

Varying the value of MAT_SIZE, I get the following results:

MAT_SIZE	RowMajor Time (ms)	ColMajor Time (ms)	Timing Ratio
1	11	10	1.10
10	22	20	1.10
100	692	645	1.07
1000	66,102	59,175	1.11

ColMajor is consistently faster than RowMajor, by about 10%. In addition, using RowMajor makes the calls to Eigen significantly uglier. Unless there's a different benefit in switching to RowMajor that I'm not seeing, it doesn't seem like a very good idea.

In asking you to write a unit test for DataWindow, I'm asking you to write a test to ensure that the interface remains stable. If you don't want to test against the underlying raw data (likely not a good idea, since the underlying implementation is much more subject to change than the interface, as you've mentioned), then I'd suggest adding a method to DataWindow to return an Eigen::Map wrapper around the underlying data and testing against that. Either way, I'd like to implement a unit test for this. It's a simple task which can only benefit us in the future.

icherkashin commented 8 years ago

Ok, I need to think about this more, probably because I couldn't pay much attention to this problem. I think now I see better what you're talking about.

In essence, the expected behavior is that DataWindow test_array(i,j) == Eigen::Map test_array(i,j) for all i and j?

Regarding the column-major vs. row-major: yes, for Eigen it would be preferable to use column major storage.

TedStudley commented 8 years ago

I think I was under the impression that we should have Datawindow test_array(i, j) == Eigen::Map test_array(j, i), although given all the confusion about coordinate ordering I'm not exactly sure anymore. The purpose of DataWindow was to provide a sort of 'physical-space'-ish coordinate window onto the data, while Eigen::Map provides a 'matrix-space' coordinate window onto the data. IIRC, the two should have row/column indices switched ((x, y) in 'physical-space' vs. (rows, columns) in 'matrix-space'). Unfortunately I didn't document the original decision to create the DataWindow wrapper (or the requirements that necessitated it) so I can't be 100% on any of this.

Obviously since the coordinate-misordering issue stems from my not really being 100% on which order things should be in the first place, I could very well be wrong. If that's not true, however, then there's really no reason why we shouldn't have been using Eigen::Map in place of DataWindow this whole time, since Eigen::Map<MatrixXd>::operator()(i, j) would reference exactly the same element as DataWindow<double>::operator()(i, j)...

Maybe @egpuckett remembers better how this whole thing is supposed to work, or can find an old e-mail correspondence regarding DataWindow.

icherkashin commented 8 years ago

I would only say that for us the internal storage order is largely irrelevant, as long as the consistent storage order is used throughout the code. The fact that i refers to x-axis and j to y-axis is an abstraction, as long as the dimensions correspond to the range of indices: it could very well be the other way around, as it was originally in your code. But DataWindow and Eigen::Map must be in exact correspondence, of course. Or, as you suggested we could simply use Eigen::Map instead of DataWindow.

I'd say let me create the unit test based on the idea I explained above and then I could try to replace DataWindow with Eigen::Map and see how well it works. If it does, we may rid ourselves of many problems this way.

TedStudley commented 8 years ago

This issue has nothing to do with the underlying storage order, save for the fact that the storage order of the current implementation is decided by the Eigen ColMajor ordering. You say that the indexing systems are arbitrary, which is obviously true, but we already have conventions set for them, which we need to uphold. For any given domain, we have three different potential indexing systems, all related:

Physical Index
Matrix Index
Underlying Array Index

Physical indices are a 'soft' convention, based on the 'natural' (x, y) coordinate system we would use when describing a point in the domain. The cells in the domain are ordered starting from the bottom left-hand corner of the domain, with the first index increasing to the right (the x-axis), and the second index increasing upward (the y-axis).

Matrix indices are a 'hard' convention, based on the standard rows-columns matrix indexing system. The cells in the domain are ordered starting from the top left-hand corner of the domain, with the first index increasing downwards (rows), and the second index increasing to the right (columns).

The underlying array index is not set-in-stone, but currently related to matrix indexing via a column-major storage ordering. This is the only indexing system with only a single dimension, since we allocate storage as a 1-dimensional array. Since the current implementation of the underlying array index is related to matrix indexing by column-major storage ordering, the cells in the domain are ordered starting from the top left-hand corner of the domain, with the index increasing downwards and to the right.

If we were to lay out an MxN domain in this way for all three indexing systems (physical p, matrix m, and array a), we would get the following (abusing markdown tables to prevent having to make a terrible ASCII table):


`p(0, M-1) = m(0, 0) = a(0)`	`p(1, M-1) = m(0, 1) = a(M)`	`...`	`p(N-1, M-1) = m(0, N-1) = a(M*(N-1))`
`p(0, M-2) = m(1, 0) = a(1)`	`p(1, M-2) = m(1, 1) = a(M+1)`	`...`	`p(N-1, M-2) = m(1, N-1) = a(M*(N-1)+1)`
`...`	`...`	`...`	`...`
`p(0, 0) = m(M-1, 0) = a(M-1)`	`p(1, 0) = m(M-1, 1) = a(2*M - 1)`	`...`	`p(N-1, 0) = m(M-1, N-1) = a(M*N-1)`

From this table, we can see pretty easily that p(x, y) = m((M-1) - y, x) = a(M * x + ((M-1) - y)) in terms of 'physical' indices, or m(i, j) = p((M - 1) - j, i) = a(M * i + j) in terms of matrix indices.

This is my understanding of the correct behavior of the different indexing systems, although there's been enough confusion that I'm no longer completely certain that this is the behavior that everybody else is expecting.

icherkashin commented 8 years ago

Ted, relax! ;) Let's limit ourselves and only discuss the immediate issues directly related to the pull-request. Otherwise, I find it impossible to work on this code if we are to engage in this kind of discussions that are so broad that they don't lead to anything immediately practical. As you said, it just introduces more confusion. As you know, there are many inconsistencies and errors in the code that have to do with dimensions and indices, so I'm simply trying to address them in a manageable way, step by step. This is just a first step that is more or less possible to comprehend in its purpose and effect, but I could submit more complex changes that would be very difficult to analyze for anybody except me (and even for me!). When the concerns you are talking about became directly relevant, then let's talk about them.

If your condition for merging these changes is to create a unit test, then please confirm that you agree with my idea about the test, since you asked for it and you should know what you want. These changes are a part of my plan and I cannot continue without having them accepted.

TedStudley commented 8 years ago

The reason why I keep bringing this issue up is because it is an issue with is immediately related to the pull request. I wanted you to create the unit test is so that we can verify the correct behavior of the DataWindow wrapper in relation to Eigen::Map. The relation between DataWindow and Eigen::Map is currently wrong, according to the index conventions that I posted above. The change which you have included in this pull request does not fix it.

I'm pushing a (currently failing) unit test to master, which can be used to validate the correct indexing conventions, by changing the same locations in a DataWindow and an Eigen::Map and comparing their underlying data arrays. You can verify for yourself that the changes which you have implemented in your pull request do not cause this unit test to pass.

The concept I have been trying to communicate is that DataWindow is a wrapper which allows us to work in a different coordinate system from either the underlying data or the Eigen::Map coordinate ordering. That is, and always has been, the sole purpose for the wrapper. The Eigen::Map coordinate system is already set by convention, and the underlying data coordinate system is set by relation to Eigen::Map (via ColMajor ordering).

The entirety of this issue is regarding the fact that the DataWindow coordinate system, being fixed by 'soft' convention as I mentioned earlier, currently does not correctly map to the two other coordinate systems. A simple unit test will allow this to be seen very clearly, as I have been saying this entire time. Although this indexing convention is not set in stone by hundreds of years of mathematics, it is still not going to change, so we instead need to find a way to fix the relationship between the three indexing conventions, which has always been the issue at hand.

Please refer to the relationship between indexing systems laid out in the previous post in order to fix the behavior of DataWindow. I'm going to close this pull request until I receive a version which shows second-order convergence and passes the unit test. If you would like to get the parameter file changes merged in sooner, please separate them into a separate pull request. Like I mentioned last time, separating concerns into different pull requests, no matter how minor the changes, can prevent a single bad change from delaying the merging of a lot of unrelated good changes.

icherkashin commented 8 years ago

The matrix and the grid "coordinates," as you refer to them, do not need to be identical. In other words, there is no need for, say, the left bottom corner of the matrix to correspond to the left bottom corner of the grid. To me, I expect the behavior that you, for some reason, do not accept: the lowest row of points on the grid will be the top row in the matrix, etc.. Why do you need these "underlying coordinates" to be identical? Prof. Puckett mentioned this once that, perhaps, you were thinking in these terms when you wrote your code, but I remember clearly that we agreed that it was a misleading and unnecessary way of thinking. In short, it is not correct to think of the grid as a matrix that, therefore, must have exactly the same "underlying coordinates."

This is why your unit test is pointless, from my point of view, unless you really wish to enforce this correspondence between the grid coordinates and the matrix coordinates. But there seem not to be any arguments supporting the necessity to enforce it. Maybe you should discuss this with Prof. Puckett (he should be helping us with the code, after all) before being so categorical: perhaps, after discussing it with him, you will understand my position a bit better. Indeed, didn't you yourself say that

although there's been enough confusion that I'm no longer completely certain that this is the behavior that everybody else is expecting.

It is probably wise to postpone merging this, as you did, until it becomes

completely certain that this is the behavior that everybody else is expecting.

Plus, I asked you to explain to me what you want from the unit test, in simple language, and I didn't understand precisely what you wanted until you have written this and merged your unit test. I hope I don't appear to diminish your efforts to explain your vision to me, but our mutual misunderstanding shows there is no consensus about the goals for this code and what are the actual problems that must be solved. That's why I think there needs to be a third party's look into a controversial discussion like this one.

You're right about the separating the changes into separate pull-requests: to me, fixing the ji to ij indexing problem was a rather minor change. In fact, this was done during the summer, but offering it as a pull-request was postponed, for a reason that I cannot recall now.

TedStudley commented 8 years ago

The entire point of the unit test is to enforce some correspondence between the grid coordinates and the matrix coordinates. Without a correspondence between the two, there's no guarantee that the behavior of the code couldn't change wildly from commit to commit. Try changing DataWindow to flip the row and column of the accessed cell (effectively changing the correspondence between grid and matrix coordinates) and then tell me that nothing has changed. Enforcing that the correspondence doesn't change seems extremely important, if you don't want benchmarks to potentially come up with different results from commit to commit. The only point that I can possibly give you is that there is no absolute reason for this specific correspondence to be enforced.

You're completely correct that the underlying matrix and grid coordinates do not have to be identical. What's also correct, however, is that don't have to not be identical. As I mentioned before, DataWindow was created for the sole reason of allowing both conventions to be used effectively, while preserving behavior in the matrix coordinate system that, to me, makes logical sense (e.g., cout << some_matrix should provide a correctly-oriented representation of the domain data, without being vertically flipped.) Updating the the coordinates to be identical would require nearly a complete rewrite of MatrixForms, since it would effectively swap the 'up' and 'down' directions of the matrix indexing system. Both systems are functionally equivalent, but there are logical benefits to the system that I've proposed, while I haven't heard any logical benefits from the system which you've proposed.

The difference between our two arguments on this subject is that I have provided a very clear picture of what my expectations are, and how I expect them to be implemented. The fact that you didn't refute my understanding of the relationships between coordinate systems when I had posted it earlier makes me question whether you even read what I had written before dismissing it offhand. Working on a codebase with others is a co-operative endeavour, especially when the code doesn't belong to you.

Your attitude during the time that I've worked with you has been extremely confrontational, and you have been reluctant to even consider any evidence which doesn't directly support your own personal opinions. Because of this, I've been forced to taking the tactic of attempting to re-iterate the same information again and again until you eventually either recognize or provide your own information countering it. This is not conducive to getting _anything_ done, and will do nothing in a situation like this but aggravate others who you may be working with. You may be an excellent coder, but most organizations--both in academia and industry--have no room for somebody who is unable to work effectively as part of a team.

icherkashin commented 8 years ago

I agree with you completely, Ted! Thank you for your insight into this matter: I'll take it into account so that I could improve myself. You're right: this is your code, so it wasn't wise of me to disagree with you about what you want from it. I truly regret to have aggravated you by being argumentative: I'll do my best to control myself!