Use scatter instead of plot for large data sets.

lazypaddy commented 5 years ago

Ran into issue trying to plot around 10000 points. It would not plot and eventually froze. Using scatter worked with no issue.

azogue commented 5 years ago

Hi @lazypaddy, sorry for being late to this PR (alerts didn't trigger).

I think your problem was not the plot kind, but the implemented method for adding points to the chart, which was thought for individual points, each one of them labelled and custom-stylised. If you look the code, you'll see that every point is added to self._handlers_annotations, so a lot of plot objects are being instantiated (one for each point). It is normal to break it with 10000 of them.

What I think you want is to overlay series of points in the psychrochart. Each one of them with a lot of points, and one label and styling for each series. Isn't it? Merging this PR would broke the one-style-per-point feature of the library, so I can't do it.

Instead of that, I'm going to update and pypi-publish a new version with changes to overlay series (that was in my forgotten 'todo' list!).

Syntax will be backwards-compatible, and the only change to make in the caller's code is passing a dict with the scatter style:

# Create a lot of points as numpy arrays
num_samples = 50000
theta = np.linspace(0, 2 * np.pi, num_samples)
r = np.random.rand(num_samples)
x, y = 7 * r * np.cos(theta) + 25, 20 * r * np.sin(theta) + 50

# Define a scatter plot style
scatter_style = {'s': 5, 'alpha': .1, 'color': 'darkorange', 'marker': '+'}

# Pass points like before, but with np.arrays (or any Iterable, like a list) instead of numbers
points = {'test_series_1': (x, y)}

# Call the method with the scatter dict 
chart.plot_points_dbt_rh(points, scatter_style=scatter_style)

Result with 50k points (~1.5s to launch, make and save as PNG**): chart_overlay_test_lot_of_points_1

Another example with two clouds (I'm adding these to the tests):

# Create a lot of points
num_samples = 100000
theta = np.linspace(0, 2 * np.pi, num_samples)
r = np.random.rand(num_samples)
x, y = 7 * r * np.cos(theta) + 25, 20 * r * np.sin(theta) + 50
x2, y2 = x + 5, y - 20

scatter_style_1 = {'s': 20, 'alpha': .05,
                   'color': 'darkblue', 'marker': 'o'}
scatter_style_2 = {'s': 10, 'alpha': .1,
                   'color': 'darkorange', 'marker': '+'}
points = {
    'test_original': {
        'label': 'Original',
        'style': scatter_style_1,
        'xy': (x, y)},
    'test_displaced': {
        'label': 'Displaced',
        'xy': (x2, y2)}
        # if no 'style' is included, the default will be used.
}
# Call the method with the scatter dict
chart.plot_points_dbt_rh(points, scatter_style=scatter_style_2)
# Include the legend:
chart.plot_legend(markerscale=1., fontsize=11, labelspacing=1.3)

Result with 2 clouds of 100k points each (<3s to launch, make and save as PNG**): chart_overlay_test_lot_of_points

(** times for a late-2013 iMac)

azogue commented 5 years ago

I'm closing this, as it is now possible to overlay series of points as scatter plots. New 0.2.3 version is available in pypi.

Thank you very much for your feedback, @lazypaddy!

lilywhiteweb commented 5 years ago

That's awesome...I haven't had a chance to test this yet, but I'm sure that will more than fit my requirements. Thanks a million for the reply and thanks for a super library. Can I be cheeky and ask if you have anything on your todo list to manage negative temperatures (in Celsius)?

azogue commented 5 years ago

Hi @lilywhiteweb, thank you very much for your appreciation, it means a lot.

About your question

Can I be cheeky and ask if you have anything on your todo list to manage negative temperatures (in Celsius)?

I think you can, if the temp range is not too extreme. From the README:

The ranges of temperature, humidity and pressure where this library should provide good results are within the normal environments for people to live in. Don't expect right results if doing other type of thermodynamic calculations. Over saturated water vapor states are not implemented.

I've made some tests and I checked that it can fail with some equations operating with below zero temps (with scary OverflowErrors), but the main reason resides in the saturation_pressure_water_vapor equation, which is using, by default, the worst (but fastest) aproximation:

def saturation_pressure_water_vapor(dry_temp_c: float, mode=3) -> float:
    """Saturation pressure of water vapor (kPa) from dry temperature.

    3 approximations:
      - mode 1: ASHRAE formulation
      - mode 2: Simpler, values for T > 0 / T < 0, but same speed as 1
      - mode 3: More simpler, near 2x vs mode 1.
    """
    # code ...

Changing it to mode=1, which I think it's the best formulation (the ASHRAE one), should give you no problems with bellow zero temps. An example, with limits [-30, 10] °C, [0, 3] gr_w/kg_da:

test_custom_psychrochart_3

I think I'll change the default 'mode' next time I'll upload a release...

Out of curiosity, what use do you intend for the library? (feel no obligation to answer this)

lilywhiteweb commented 5 years ago

@azogue , I am looking to plot some data from an Air Handling Unit to see if it is operating within it's correct operating envelope. Some of the values could be below zero, depending on ambient conditions so your fix would be really cool. I'll try that out with some dummy data to see how it works.

azogue commented 5 years ago

Oh, in that case I think there wouldn't be any problem, @lilywhiteweb

The fact that the configuration examples don't have below-zero temps in range it is coincidental, not because any un-implemented feature, you can use it as it is now.

Just change the chart limits to show your needed temp and humidity ranges.

azogue / psychrochart

Use scatter instead of plot for large data sets. #3