Closed calispac closed 6 years ago
I tried to solve the issue, but without any success nor real understanding of it. I would prefer to have it solved, but as it's been a week already so I'm ok with dropping 3.5
I agree with yves we should solve the py3.5 issue and not drop it, just because we cannot figure out what is going on... maybe we can decide in the evening ... when everybody had a chance to try the luck and fix this problem
The problem with py3.5 does not only occur in new PRs .. but also in the master.
I deselected test_dataquality.test_data_quality()
... and found all other tests pass but this one. So maybe we only need to "fix" this test ... looking into it.
Yes this is where it originated. I am scratching my head now :smile:
from what I understood, the problem is from: pd.to_datetime(data['time']), which triggers a timezone error when it is plotted()
The problem seemed to be plotting a plot with a datetime index, which is not timezone aware. I wanted to find the minimal code which reproduces the error, so I tried this:
def test_foo():
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
with tempfile.TemporaryDirectory() as tmpdirname:
N = 100
time = (
np.random.randint(low=-1000, high=1000, size=N) +
1509415494508984736
)
trigger_rate = np.random.normal(loc=77, size=N)
data = pd.DataFrame(
data={
'time': time,
'trigger_rate': trigger_rate,
}
)
data['time'] = pd.to_datetime(data['time'])
data = data.set_index('time')
plt.figure()
plt.plot(data['trigger_rate'] * 1E9)
plt.ylabel('rate [Hz]')
out_path = os.path.join(tmpdirname, 'foo.png')
plt.savefig(out_path)
[updated 14:48]
this includes now the to_datetime()
call ... but it runs nicely
Sorry .. false alarm .. I am still unable to find a small example to reproduce this error
Ah .. okay I slowly come closer ..
So in the above example, I use this to fake the time:
time = (
np.random.randint(low=-1000, high=1000, size=N) +
1509415494508984736
)
The number I took from the real test_data_quality
test by putting a print(data) here:
data = Table.read(fits_filename, format='fits')
data = data.to_pandas()
print(data) # <----- this is to look at data before to_datetime
data['time'] = pd.to_datetime(data['time'])
Okay so far so good .. so I took the number .. and randomly added some seconds to make a couple of different times.
Now I realized the number 1509415494508984736
is not the time in seconds ... the time in seconds is more like 1539694585
..
So the number is more like .. the time in nanoseconds... so I was only adding nanoseconds ..
So I multiplied the random numbers with 1000 .. and guess what .. it fails.
time = (
np.random.randint(low=-1000, high=1000, size=N) * 1000 + # <--- * 1000 makes it fail
1509415494508984736
)
So far so good so now I have a minimal example which nicely fails
Ah and when I multiply the random ints with 1e9
.. so they are really random seconds .. it does not fail anymore.
There is a sweet spot between 1000 and 100000 where it fails .. very stupid
nice ! However I'm still not sure if it is a pandas, pytz or matplotlib regression.
Ah and when I multiply the random ints with 1e9 .. so they are really random seconds .. it does not fail anymore.
Could you print the complete error please ?
Sure ... but actually I've put up the minimal code, which reproduces the error into this chat so that everybody who wants to study this problem can reproduce the error themselves and play with it, so here is the error:
15:10 $ python force_error.py
trigger_rate
time
2017-10-31 02:04:54.585584736 76.807937
2017-10-31 02:04:54.586784736 77.399938
2017-10-31 02:04:54.590584736 78.088984
2017-10-31 02:04:54.515484736 74.955927
Traceback (most recent call last):
File "/home/dneise/anaconda3/envs/digicampipe/lib/python3.5/site-packages/matplotlib/backends/backend_qt5.py", line 519, in _draw_idle
self.draw()
File "/home/dneise/anaconda3/envs/digicampipe/lib/python3.5/site-packages/matplotlib/backends/backend_agg.py", line 402, in draw
self.figure.draw(self.renderer)
File "/home/dneise/anaconda3/envs/digicampipe/lib/python3.5/site-packages/matplotlib/artist.py", line 50, in draw_wrapper
return draw(artist, renderer, *args, **kwargs)
File "/home/dneise/anaconda3/envs/digicampipe/lib/python3.5/site-packages/matplotlib/figure.py", line 1652, in draw
renderer, self, artists, self.suppressComposite)
File "/home/dneise/anaconda3/envs/digicampipe/lib/python3.5/site-packages/matplotlib/image.py", line 138, in _draw_list_compositing_images
a.draw(renderer)
File "/home/dneise/anaconda3/envs/digicampipe/lib/python3.5/site-packages/matplotlib/artist.py", line 50, in draw_wrapper
return draw(artist, renderer, *args, **kwargs)
File "/home/dneise/anaconda3/envs/digicampipe/lib/python3.5/site-packages/matplotlib/axes/_base.py", line 2604, in draw
mimage._draw_list_compositing_images(renderer, self, artists)
File "/home/dneise/anaconda3/envs/digicampipe/lib/python3.5/site-packages/matplotlib/image.py", line 138, in _draw_list_compositing_images
a.draw(renderer)
File "/home/dneise/anaconda3/envs/digicampipe/lib/python3.5/site-packages/matplotlib/artist.py", line 50, in draw_wrapper
return draw(artist, renderer, *args, **kwargs)
File "/home/dneise/anaconda3/envs/digicampipe/lib/python3.5/site-packages/matplotlib/axis.py", line 1185, in draw
ticks_to_draw = self._update_ticks(renderer)
File "/home/dneise/anaconda3/envs/digicampipe/lib/python3.5/site-packages/matplotlib/axis.py", line 1023, in _update_ticks
tick_tups = list(self.iter_ticks()) # iter_ticks calls the locator
File "/home/dneise/anaconda3/envs/digicampipe/lib/python3.5/site-packages/matplotlib/axis.py", line 967, in iter_ticks
majorLocs = self.major.locator()
File "/home/dneise/anaconda3/envs/digicampipe/lib/python3.5/site-packages/matplotlib/dates.py", line 1230, in __call__
return self._locator()
File "/home/dneise/anaconda3/envs/digicampipe/lib/python3.5/site-packages/pandas/plotting/_converter.py", line 473, in __call__
freq=freq, tz=tz).astype(object)
File "/home/dneise/anaconda3/envs/digicampipe/lib/python3.5/site-packages/pandas/core/indexes/datetimes.py", line 2749, in date_range
closed=closed, **kwargs)
File "/home/dneise/anaconda3/envs/digicampipe/lib/python3.5/site-packages/pandas/core/indexes/datetimes.py", line 381, in __new__
ambiguous=ambiguous)
File "/home/dneise/anaconda3/envs/digicampipe/lib/python3.5/site-packages/pandas/core/indexes/datetimes.py", line 506, in _generate
tz = timezones.maybe_get_tz(tz)
File "pandas/_libs/tslibs/timezones.pyx", line 87, in pandas._libs.tslibs.timezones.maybe_get_tz
File "pandas/_libs/tslibs/timezones.pyx", line 102, in pandas._libs.tslibs.timezones.maybe_get_tz
File "/home/dneise/anaconda3/envs/digicampipe/lib/python3.5/site-packages/pytz/__init__.py", line 177, in timezone
raise UnknownTimeZoneError(zone)
pytz.exceptions.UnknownTimeZoneError: 'UTC+00:00'
and here is my force_error.py
:
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
N = 4
time = (
np.random.randint(low=-1000, high=1000, size=N) * 100000 +
1509415494508984736
)
trigger_rate = np.random.normal(loc=77, size=N)
data = pd.DataFrame(
data={
'time': time,
'trigger_rate': trigger_rate,
}
)
data['time'] = pd.to_datetime(data['time'])
data = data.set_index('time')
print(data)
plt.figure()
plt.plot(data['trigger_rate'] * 1E9)
plt.ylabel('rate [Hz]')
plt.show()
Notes:
plt.show()
or plt.safefig("foo.png")
are used. plt.plot()
and not in to_datetime
The reason seems to be somewhere in the part, where pandas//matplotlib want to make a date_range
with equal spacing for the given dataset, so that they can plot the x-axis ...
Maybe the problem is here: https://github.com/pandas-dev/pandas/blob/8af2bea07f7864e1df8ee1c43546cad59043fa7a/pandas/plotting/_converter.py#L465-L469
tz is set independently of the timezone of the given dataset
Ah no .. that's not the problem .. I am just not fit to understand the code here.
Ah an another remark, I tried to fix this issue by making the data
DatetimeIndex timezone away using tz_localize
.. did not solve the issue.
fixed by 5ff62c2 , closing the issue ?
In recent PRs #251 #250 #249 #248 #247 #246 #245
Link to the travis test : https://travis-ci.org/cta-sst-1m/digicampipe/jobs/437119169 Which are currently all blocked...
We identified a bug with pandas and python 3.5. However we don't know the exact source of the issue. Should we try to solve this? Or just drop python 3.5?
Like if you want to drop py3.5. Dislike if you prefer to solve the issue