AllenDowney / ModSimPy

Text and supporting code for Modeling and Simulation in Python
https://allendowney.github.io/ModSimPy/
MIT License
823 stars 1.76k forks source link

Fixing value overflow issue when using ModSimVectors with integers #54

Closed jsieving closed 4 years ago

jsieving commented 4 years ago

This fix converts each element of a vector to a float when the Vector function is used to create a ModSimVector.

When ModSimVectors are created using the Vector function with large integer arguments, subsequent operations on that vector can result in unexpected values due to an apparent integer overflow in numpy.dot. This issue was discovered when using the vector_mag and vector_hat functions, and the problem was found to originate in np.dot.

This error did not occur if the vector values were cast to floats, presumably because np.dot defaults to using a data type for floats that allows higher values.

AllenDowney commented 4 years ago

Thanks for this suggestion. I am undecided about whether this is a good change.

On one hand, it seems like a Vector that contains integers is useful and should not be prohibited. If the user wants floats, they can create the Vector with floats.

On the other hand, the intent of this class is to represent physical quantities, which would most appropriately be floats.

Can you tell me more about the use case that causes the error?

jsieving commented 4 years ago

This is a problem that several first-year students taking ModSim have encountered while working on the orbital mechanics notebook. 1 AU expressed in meters, or 1.49e11, is large enough to cause an overflow issue when it is squared (as happens when finding the magnitude of the vector.)

My fix may not be ideal, but it seems like it would be good to address it in some way. Currently it gives unpredictable results to students who are not necessarily familiar with Python, and in the case that the result is positive, it's hard to tell where anything went wrong.

One thing which reduced the appearance of this error is that, when a vector with units is created using the syntax: r = Vector(x, y) * units Even if x and y are integers, the multiplication with units causes them to be converted to floats. So, in some cases, the conversion happens anyway. The values will stay as integers if this syntax is used: r = Vector(x*units, y*units)

If you want to allow integers to be used, I can think of a couple alternatives.

  1. Conversion to a float would happen by default, but the user could pass a 'type' parameter to the Vector function, specifying a data type recognized by numpy to be used for the vector values. Perhaps the vector would be stored as a numpy array of the specified type, with associated units. Then, at least, the user has to be explicit about what type they want, and are more likely to catch and understand any error that occurs.

  2. Vector functions which might trigger an overflow could check the initial type and value, and give a warning if the value was likely to exceed the size limit. This isn't my favorite, but at least the user will know what went wrong and might have an opportunity to fix it while testing their code.

On the other hand, the ModSimPy library is intended for modeling real-world situations, and Vectors are for real-world quantities, so it may make sense for these to be float values. If a user knows they need integer values, there are other tools they can use, such as numpy arrays.

AllenDowney commented 4 years ago

I am still confused about the use case. 1.49e11 is a float, so where are the integers coming from?

Allen

On Sat, Nov 16, 2019, at 2:56 PM, jsieving wrote:

This is a problem that several first-year students taking ModSim have encountered while working on the orbital mechanics notebook. 1 AU expressed in meters, or 1.49e11, is large enough to cause an overflow issue when it is squared (as happens when finding the magnitude of the vector.)

My fix may not be ideal, but it seems like it would be good to address it in some way. Currently it gives unpredictable results to students who are not necessarily familiar with Python, and in the case that the result is positive, it's hard to tell where anything went wrong.

One thing which reduced the appearance of this error is that, when a vector with units is created using the syntax: r = Vector(x, y) * units Even if x and y are integers, the multiplication with units causes them to be converted to floats. So, in some cases, the conversion happens anyway. The values will stay as integers if this syntax is used: r = Vector(x*units, y*units)

If you want to allow integers to be used, I can think of a couple alternatives.

  1. Conversion to a float would happen by default, but the user could pass a 'type' parameter to the Vector function, specifying a data type recognized by numpy to be used for the vector values. Perhaps the vector would be stored as a numpy array of the specified type, with associated units. Then, at least, the user has to be explicit about what type they want, and are more likely to catch and understand any error that occurs.

  2. Vector functions which might trigger an overflow could check the initial type and value, and give a warning if the value was likely to exceed the size limit. This isn't my favorite, but at least the user will know what went wrong and might have an opportunity to fix it while testing their code.

On the other hand, the ModSimPy library is intended for modeling real-world situations, and Vectors are for real-world quantities, so it may make sense for these to be float values. If a user knows they need integer values, there are other tools they can use, such as numpy arrays.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/AllenDowney/ModSimPy/pull/54?email_source=notifications&email_token=AAOLP3JORWJ7EPCTCHBHRGLQUBF5DA5CNFSM4JOCSOH2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEEHZHGY#issuecomment-554668955, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAOLP3J74EVNZP4OL5LJLXTQUBF5DANCNFSM4JOCSOHQ.

jsieving commented 4 years ago

Sorry, I wrote it in scientific notation for brevity, which was really unclear since in Python that would be a float. When people type it out as 149000000000, then there's a problem. Some students are in the habit of typing out long numbers instead of using scientific notation. I realize it's a small use case, but I'm sure this will come up in projects involving planetary motion, or in future iterations of the course. Of course, the answer could be to tell people not to use integers, but it seems like there are ways the function could work that would save people a few hours of confusion, especially because value overflows aren't something people expect to see when debugging Python.

AllenDowney commented 4 years ago

Running into this error seems like a learning opportunity. The book doesn't talk much about integers and floating-point, but at some point it's good to know that integers are for counting and floats are for measuring. Expressing a large quantity as an integer is a bad idea, in part because it might cause overflow, but also in part because it's not the right type to express a measurement.

On Sat, Nov 16, 2019, at 10:04 PM, jsieving wrote:

Sorry, I wrote it in scientific notation for brevity, which was really unclear since in Python that would be a float. When people type it out as 149000000000, then there's a problem. Some students are in the habit of typing out long numbers instead of using scientific notation. I realize it's a small use case, but I'm sure this will come up in projects involving planetary motion, or in future iterations of the course. Of course, the answer could be to tell people not to use integers, but it seems like there are ways the function could work that would save people a few hours of confusion, especially because value overflows aren't something people expect to see when debugging Python.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/AllenDowney/ModSimPy/pull/54?email_source=notifications&email_token=AAOLP3KLB7CLKMPDDDHMVJTQUCYE3A5CNFSM4JOCSOH2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEEIAMCQ#issuecomment-554698250, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAOLP3N3UTRP43YRGB6EMQ3QUCYE3ANCNFSM4JOCSOHQ.

jsieving commented 4 years ago

While I think that's a good point, it's hard to use it as a learning opportunity if the person encountering the problem never figures out what's wrong. The issue was originally solved by one of the professors, and I don't think they had an explanation for it. Although I'm familiar with the technical cause, even I didn't think "this issue occurs because I used an integer for a measurement, and that's not appropriate" until you put it that way. So, if you think this is a good time for students to learn that distinction, maybe there should be a small section in the book around Chapter 22, or a link in the orbital mechanics notebook to an explanation that they should read.

This is one of those things that if you don't understand the root cause, finding the solution seems as random as crossing your fingers while making seemingly trivial changes to your code. If this isn't going to be treated as a learning opportunity, then it should at least be changed so that students don't spend hours fruitlessly checking over their code and ultimately turning it in incomplete.

AllenDowney commented 4 years ago

Ok, let's do it. I'll merge first and run the test suite later, because that's the kind of software engineer I am.

AllenDowney commented 4 years ago

All tests passing.