PatrikHlobil / Pandas-Bokeh

Bokeh Plotting Backend for Pandas and GeoPandas
MIT License
878 stars 112 forks source link

Problem plotting data without first saving to csv and reloading csv #45

Closed CorBer closed 5 years ago

CorBer commented 5 years ago

Hi,

I am using the latest version of the library in a JupyterLab (1.1.4) environment.

The problem I have that I want to plot a rather simple dataset I retrieve from a url. The data is retrieved and converted to a pandas DataFrame without problems. If I then try to plot the data the library simply turns out error "0".

The full standalone code is below. Notice that nearly at the end I save the pandas dataframe to a CSV, then directly read that CSV back into another pandas dataframe and I can plot the data. So plotting df_a works as planned, trying to plot df results in an error.

regards Cor

import requests
import csv
import io
import pandas as pd
import pandas_bokeh
pandas_bokeh.output_notebook()

url = 'http://www.seismicportal.eu/fdsnws/event/1/query?limit=100&minmag=4.5&minlat=34&maxlat=42&minlon=26&maxlon=46&format=text'
r = requests.get(url)
content = r.content.decode('iso-8859-1')
lines=[]
for line in csv.reader(content.splitlines(),delimiter='\n'): #start by splitting the response by linebreaks
    lines+=line
df = pd.DataFrame([sub.split("|") for sub in lines])

# Retrieve HTTP meta-data
# print(r.status_code)
# print(r.headers['content-type'])
# print(r.encoding)

new_header = df.iloc[0] #grab the first row for the header
df = df[1:] #take the data less the header row
df.columns = new_header
df.columns=df.columns.str.strip() #remove any trailing/starting spaces from columnnames
df.Longitude=df.Longitude.astype(float)
df.Latitude=df.Latitude.astype(float)
df.Magnitude=df.Magnitude.astype(float)
df.to_csv('savequakes.csv')
dfa = pd.read_csv('savequakes.csv')

dfa['size']=dfa.Magnitude*5
dfa.plot_bokeh.map(
    x="Longitude",
    y="Latitude",
    category='Magnitude',
    colormap='Magma',
    line_color='black',
    hovertool_string="<h2> @{Time} </h2> <h3> Magnitude: @{Magnitude} </h3>",
    tile_provider='CARTODBPOSITRON',
    size="size",
    figsize=(1200, 800),
    title="earthquakes")
PatrikHlobil commented 5 years ago

Hi @CorBer ,

This is a pretty interesting bug, thanks for reporting. The reason is actually the code line df = df[1:] #take the data less the header row, because in this case the first index has value 1. When saving and loading the csv, it resets the index to 0, so everything works fine.

I fixed the error for the next release, however you can easily avoid this by just using the awesome pd.read_csv functionality, which can also read from a web server like this:


import pandas as pd
import pandas_bokeh
pandas_bokeh.output_notebook()

df = pd.read_csv(r'http://www.seismicportal.eu/fdsnws/event/1/query?limit=100&minmag=4.5&minlat=34&maxlat=42&minlon=26&maxlon=46&format=text', sep="|")
df.columns=df.columns.str.strip()
df.Longitude=df.Longitude.astype(float)
df.Latitude=df.Latitude.astype(float)
df.Magnitude=df.Magnitude.astype(float)
df.head()
df['size']=df.Magnitude*5
df.plot_bokeh.map(
    x="Longitude",
    y="Latitude",
    category='Magnitude',
    colormap='Magma',
    line_color='black',
    hovertool_string="<h2> @{Time} </h2> <h3> Magnitude: @{Magnitude} </h3>",
    tile_provider='CARTODBPOSITRON',
    size="size",
    figsize=(1200, 800),
    title="earthquakes")

I hope this answers you question.

Best Patrik

CorBer commented 5 years ago

Hi Patrik,

Ive just tested and it works. Its a win for both of us that way, I have learned a better way to read the data and you got a bug that you could remove ;) Thanks for the fast reaction and keeping this library updated.

regards Cor

CorBer commented 5 years ago

I am still amazed by the power libraries like yours bring forward. Being able to question a webserver and presenting its results on a map in just a few lines is great ! Just realized that the astype(float) transformations are not necessary also.