brandon-rhodes / pyephem

Scientific-grade astronomy routines for Python
Other
781 stars 121 forks source link

Day to day variation in next_full_moon calculations #20

Closed westurner closed 11 years ago

westurner commented 11 years ago

This raises an AssertionError:

import ephem, datetime
today = datetime.datetime.now()
tomrw = today + datetime.timedelta(1)
assert ephem.next_full_moon(today) == ephem.next_full_moon(tomrw)

I'm certainly not an astronomy domain expert, but why would the next full moon calculation vary based on the date of reference?

westurner commented 11 years ago

... surely there is a better way to calculate the next year's worth of full moons than:

import datetime, ephem
now = datetime.datetime.now()
dates = [ (now+datetime.timedelta(n)) for n in xrange(366) ]
next_year_of_full_moons = set( ephem.next_full_moon(d).datetime() for d in dates)
print(len(next_year_of_full_moons))
# 278 !?
brandon-rhodes commented 11 years ago

Because of rounding errors inherent in all floating-point arithmetic, two procedures for reaching the “same” floating-point number will usually produce slightly different numbers. For example:

>>> 1.1 + 2.2
3.3000000000000003
>>> 0.3 + 3.0
3.3
>>> 1.1 + 2.2 == 0.3 + 3.0
False

Since even these two simple paths to the same number 3.3 cannot agree, you can imagine what tiny sources of difference might be present in the routine that starts at any date and time and generates several moon-sun-earth positions as it narrows down towards the exact moment of the next full moon! If you will print the two dates that are returned, instead of comparing them with ==, you will see that the “same answer” has been returned, within the limits of your processor's floating point FPU:

print ephem.next_full_moon(today)
print ephem.next_full_moon(tomrw)

2013/6/23 11:32:15
2013/6/23 11:32:15

When comparing two floating-point numbers, it is best to choose a small value that you can call close_enough (for a full moon calculation, perhaps a second would be close enough?) and then compare them this way:

>>> assert abs(ephem.next_full_moon(today) - ephem.next_full_moon(tomrw)) < close_enough
brandon-rhodes commented 11 years ago

To compute a whole year of full moons, simply use each moon's date as your starting point for finding the next one. This is a pattern you can even use elsewhere in Python; to find all of the periods in a string, for example, you need to start each search from where the last one left off:

s = 'Veni. Vidi. Vici.'
i = -1
while True:
    i = s.find('.', i + 1)
    if i == -1:
        break
    print i

The procedure for finding a series of full moons is even simpler, since you don't need the check in the middle of the loop for whether you have run off of the end of the sequence, since there is always another full moon that you could find:

import ephem

start = date = ephem.now()
full_moons = []
while date < start + 366.0:
    date = ephem.next_full_moon(date)
    full_moons.append(date)

for date in full_moons:
    print date

I think that this 3-line loop will give you the results you need; can you try it out and let me know?

westurner commented 11 years ago

Thanks!

So what you're saying is next_full_moon calculations are not vectorizable due to floating point rounding errors in Python?

Would something like SymPy work around this infinitesimal level of error?

http://docs.python.org/library/unittest.html#unittest.TestCase.assertAlmostEqual

brandon-rhodes commented 11 years ago

The general reason that they are not vectorizable is that, to compute when Full Moon #101 happens, you have to have at least a rough idea of when Full Moon #100 happens, and so it generally makes the most sense to discover each moon in sequence.

The whole issue of rounding errors does not arise until you create a situation in which you are computing each full moon more than once — in which case you are duplicating work, and degrading, to some extent, the whole point of doing a computation as a vector in the first place. But you might want to try out a scheme where you select dates, say, 20 days apart, ask for the full moon after each of them, and then use a test to eliminate near-duplicates that sit next to each other in your resulting list. Of course, PyEphem will be of no use in doing it as a vector operation, since it's bare C math code with no vectors anywhere; but doing it with Skyfield, once it has a similar full-moon operation, might be interesting!

SymPy would be of no use that I can imagine. Transcendental functions of floating-point guesses during a Newton's-method descent are going to be arbitrary floats, and keeping them suspended as formulae would not help unless there's something I'm missing. (Which, by the way, is always possible; I'm not an expert!)

westurner commented 11 years ago

Thanks! I think this describes my concerns:

http://docs.sympy.org/dev/modules/mpmath/technical.html#precision-and-representation-issues (mpmath + gmpy2)

westurner commented 11 years ago

I was not aware of https://github.com/brandon-rhodes/python-skyfield

brandon-rhodes commented 11 years ago

I was avoiding any mention of Skyfield at this point, since I have not added full moons yet. :)

You raise the issue of precision, so let's look at that in more detail. To decide when it has actually reached the moment of Full Moon, PyEphem narrows down the input date until the difference between the Sun-Moon angle and an ideal 180° is less than a value named tiny:

https://github.com/brandon-rhodes/pyephem/blob/master/src/ephem/__init__.py#L193

That value is about 0.002778 arcseconds, or 0.000000002143 of the total way around the sky:

https://github.com/brandon-rhodes/pyephem/blob/master/src/ephem/__init__.py#L19

This is a vastly larger number than the kinds of errors that your link is talking about — the error that results from the fact that PyEphem uses 64-bit floats are at around the magnitude 1e-16, which is maybe one million times smaller than the value tiny. So I believe that we can throw out floating-point precision as significant.

But what about the accuracy of the astronomical models themselves? It turns out that the solar system model used by the underlying "libastro" library, called the VSOP87 model, can only deliver the position of the Sun or Moon to without about 1 arcsecond. Which means that the tiny, tiny difference in two estimates of the Full Moon — which must necessarily lie within 2 * tiny of each other — is insignificant compared to the degree to which VSOP87 is simply likely to be wrong about the Moon's position to begin with.

Which is, it turns out, why I set tiny to be moderately, but not crazily, small: once you have gone into more detail than the solar system position estimates themselves, you are starting to waste time on precision that has no physical meaning.

Does this help you understand why SymPy is not likely to be relevant for this particular physical problem? It has been a long day, so feel free to ask further questions if I am not managing to make as much sense as I should!

westurner commented 11 years ago

I believe I understand now.

If I understand correctly, you are saying that the precision (and thereby the significant figures) of the _find_moon_phase function is limited primarily by the accuracy of the VSOP87D model, and not by the 64 bit floats utilized by PyEphem and libastro 3.7.5; so - in this context - there would be no benefit to utilizing arbitrary precision arithmetic.

(TIL Python floats are like IEEE-754 binary64 doubles, which have 53 bits of precision and that BigFloat wraps GNU MPFR in order to utilize arbitrary-precision arithmetic, while gmpy2 implements "a new mpfr type based on the [MPFR] library".)

Thank you so much for your time.

brandon-rhodes commented 11 years ago

Yes! You put it very well. I will try to get all of this written up as I document Skyfield so that, with the new library, people have straight answers to these questions to begin with.

westurner commented 11 years ago

http://en.wikipedia.org/wiki/Linked_data

http://schema.org

http://schema.org/DataType

http://schema.org/Thing

westurner commented 11 years ago

http://docs.ckan.org/en/latest/linked-data-and-rdf.html

Pandas: ENH: Linked Datasets

westurner commented 11 years ago

... Here is research regarding the use of XML, RDF, Turtle, and HTML5 (Microdata) for sharing scientific things.

I suppose I could have worked this into parenthetical form with anchor text. For translation and/or copying into documentation at some later point, this reference seemed relevant at the time.

Thing > CreativeWork > http://schema.org/Dataset

brandon-rhodes commented 11 years ago

If you will glance at the page title, this is issue 20: “Day to day variation in next_full_moon calculations” and I suspect you intended to put these references elsewhere.

westurner commented 11 years ago

Why would you suggest that?

It seems you are making some sort of a topicality argument: that these links are not relevant to the issue under discussion. I believe these links to be relevant to the issue under discussion. Here's why:

It is wasteful for us to constantly recalculate next_full_moon calculations with error propagation. It would be more efficient for us to share resources containing already-calculated data. We could spend time discussing, "Which day should one start with in order to get the correct answer", or we could memoize the calculations and share data between our tools.

The best known approach to sharing data depends on (many of) these standards.

It appears that the homepage for libastro is still down.

From https://en.wikipedia.org/wiki/XEphem#Catalogs :

Numerical routines are used in PyEphem with permission of Elwood Downey.[10]

It would be great if we could just do online queries from something like CKAN (or SPARQL), instead of wrangling variously appropriate space formats like those mentioned here http://spacepy.lanl.gov/doc/datamodel.html .

The full moon calculations may or may not be relevant to finding asteroids in space.

brandon-rhodes commented 11 years ago

I suggested it only because the links did not mention floating-point arithmetic, which is what you had been discussing in your most recent comment, so I got confused.

The idea of a table of pre-computed moon phases is not necessarily efficient for small cases — it takes me only 0.043 seconds to compute a year of full moons on this travel laptop, but 0.086 seconds to download the same data from the USNO through their web API at http://aa.usno.navy.mil/cgi-bin/aa_moonphases.pl?year=2013&ZZZ=END — but if hundreds of full moons were involved, a URL fetch could be faster. (Though I suppose it would also take a tiny bit of time for Python to parse the incoming file?)

But PyEphem users generally need their programs to be able to work without needing network access; they want their programs to keep working even on an airplane, or when out of mobile range. So the best approach would be for users who need the extra speed to cache the list of full moons locally to a file, and then they could be pulled back in from the file on each subsequent run of — well, of whatever program needed that many full moons to operate.

My guess is that Skyfield will make it so fast to compute full moons with PyPy that even thousands of full moons will be no problem, but of course I'm a few weeks out from knowing for sure.

The “libastro” site you found is not related to the XEphem or PyEphem projects; it appears to be someone re-inventing the name on their own for a separate project or effort of their own, that it appears they were going to write in C++.