TutteInstitute / datamapplot

Creating beautiful plots of data maps
MIT License
773 stars 50 forks source link

KeyError: 'pop from an empty set' when cluster_boundary_polygons=True #34

Closed zilch42 closed 2 months ago

zilch42 commented 2 months ago

I'm getting an error when trying to use cluster_boundary_polygons=True with an interactive plot. The plot generates fine if cluster_boundary_polygons=False, and static plots generate fine with the same data. I thought it might have been an issue with having too few points per cluster but I've tried doubling the lists (min 6 points in each cluster) and it still failed there. Any ideas what I'm doing wrong?

datamapplot v0.3.0

import numpy as np
import datamapplot

dummy_labels = [['T1', 'Unlabelled', 'T1', 'T1', 'T2', 'T2', 'T2', 'T2', 'Unlabelled', 'Unlabelled', 'T2', 'T2'],
                ['T3', 'Unlabelled', 'T4', 'T4', 'T3', 'T3', 'T4', 'T5', 'Unlabelled', 'Unlabelled', 'T5', 'T5']]
dummy_label_array = np.array(dummy_labels)
dummy_hover = ['Text']*12

dummy_coords = np.random.rand(12, 2)

# fixed coordinates that work (see issue #33)
dummy_coords1 = np.array([[0.36944978, 0.24840278],
       [0.05371874, 0.6797169 ],
       [0.73539839, 0.5694784 ],
       [0.37300726, 0.33794748],
       [0.09289142, 0.2358011 ],
       [0.95914631, 0.72673466],
       [0.67561212, 0.54775067],
       [0.54549458, 0.04819768],
       [0.88099899, 0.39860445],
       [0.26929064, 0.12571897],
       [0.84381725, 0.57393012],
       [0.74503466, 0.33412061]])

datamapplot.create_interactive_plot(
    dummy_coords1, 
    *dummy_label_array, 
    hover_text=dummy_hover, 
    cluster_boundary_polygons=True
    )
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
Cell In[1], line 25
     11 # fixed coordinates that work (see issue #33)
     12 dummy_coords1 = np.array([[0.36944978, 0.24840278],
     13        [0.05371874, 0.6797169 ],
     14        [0.73539839, 0.5694784 ],
   (...)
     22        [0.84381725, 0.57393012],
     23        [0.74503466, 0.33412061]])
---> 25 datamapplot.create_interactive_plot(
     26     dummy_coords1, 
     27     *dummy_label_array, 
     28     hover_text=dummy_hover, 
     29     cluster_boundary_polygons=True
     30     )

File c:\\Users\\[path]\\lib\\site-packages\\datamapplot\\create_plots.py:442, in create_interactive_plot(data_map_coords, hover_text, inline_data, noise_label, noise_color, color_label_text, label_wrap_width, label_color_map, width, height, darkmode, palette_hue_shift, palette_hue_radius_dependence, cmap, marker_size_array, marker_color_array, use_medoids, cluster_boundary_polygons, color_cluster_boundaries, polygon_alpha, *label_layers, **render_html_kwds)
    432     label_dataframe = pd.DataFrame(
    433         {
    434             \"x\": [data_map_coords.T[0].mean()],
   (...)
    438         }
    439     )
    440 else:
    441     label_dataframe = pd.concat(
--> 442         [
    443             label_text_and_polygon_dataframes(
    444                 labels,
    445                 data_map_coords,
    446                 noise_label=noise_label,
    447                 use_medoids=use_medoids,
    448                 cluster_polygons=cluster_boundary_polygons,
    449                 alpha=polygon_alpha,
    450             )
    451             for labels in label_layers
    452         ]
    453     )
    455 if label_color_map is None:
    456     if cmap is None:

File c:\\Users\\[path]\\lib\\site-packages\\datamapplot\\create_plots.py:443, in <listcomp>(.0)
    432     label_dataframe = pd.DataFrame(
    433         {
    434             \"x\": [data_map_coords.T[0].mean()],
   (...)
    438         }
    439     )
    440 else:
    441     label_dataframe = pd.concat(
    442         [
--> 443             label_text_and_polygon_dataframes(
    444                 labels,
    445                 data_map_coords,
    446                 noise_label=noise_label,
    447                 use_medoids=use_medoids,
    448                 cluster_polygons=cluster_boundary_polygons,
    449                 alpha=polygon_alpha,
    450             )
    451             for labels in label_layers
    452         ]
    453     )
    455 if label_color_map is None:
    456     if cmap is None:

File c:\\Users\\[path]\\lib\\site-packages\\datamapplot\\interactive_rendering.py:468, in label_text_and_polygon_dataframes(labels, data_map_coords, noise_label, use_medoids, cluster_polygons, alpha)
    463     if cluster_polygons:
    464         simplices = Delaunay(cluster_points).simplices
    465         polygons.append(
    466             [
    467                 smooth_polygon(x).tolist()
--> 468                 for x in create_boundary_polygons(
    469                     cluster_points, simplices, alpha=alpha
    470                 )
    471             ]
    472         )
    474 label_locations = np.asarray(label_locations)
    476 if cluster_polygons:

File c:\\Users\\[path]\\lib\\site-packages\\datamapplot\\alpha_shapes.py:43, in create_boundary_polygons(points, simplices, alpha)
     41 polygons = []
     42 search_set = boundary.copy()
---> 43 sequence = list(search_set.pop())
     44 while len(search_set) > 0:
     45     to_find = sequence[-1]

KeyError: 'pop from an empty set'
zilch42 commented 2 months ago

Additional info, I have tried running the wikipedia interactive plot example and that runs fine with cluster_boundary_polygons=True. I have also compared my input data formats with the data in that example and changed the following line to match the format found there:

dummy_label_array = [np.array(l) for l in dummy_labels]

but even with that change I still get the same 'pop from an empty set' error

lmcinnes commented 2 months ago

The boundary shapes are using alpha shapes and can be a bit finicky if the clusters aren't pretty reasonable in the 2D plot, so it is possible boundaries are not going to work out well for your data. With that being said, normally they just provide bad squiggles instead of boundaries in the bad case rather than a hard fail as you have here.

It seems like it is not finding an reasonable boundary for the alpha shape (the search set is empty) which could definitely happen if the alpha is too small. I guess that may depend on the scale of the data. I tried to choose a reasonably robust alpha as the default, but it may be making things go bad in your case. You can set this yourself with polygon_alpha in the call to create_interactive_plot. The default value is 0.1 but you can try much larger values (as the value tends to infinity the boundary will tend to the convex hull). Start with something like 10 and see if that stops things from breaking.

I'll see if I can manage to have a more useful error message result if this happens.

zilch42 commented 2 months ago

Thanks! Yes alpha = 10 works. Even alpha = 1 works. For what it's worth, I've just been playing with tiny datasets because it's quick to process and play around with, and that's where I'm finding these errors, but generally DataMapPlot isn't a great tool for tiny datasets. It's much better when the data is large. So this is probably an edge case, but more informative error messages would be very much appreciated!