elastic / ember

Elastic Malware Benchmark for Empowering Researchers
Other
953 stars 279 forks source link

pandas groupby keyerror:"subset" #63

Closed kou18n closed 3 years ago

kou18n commented 3 years ago

Hello,this code in the ember2018-notebook.ipynb can not display.Could you help me? plotdf = emberdf.copy() gbdf = plotdf.groupby(["label", "subset"]).count().reset_index() alt.Chart(gbdf).mark_bar().encode( alt.X('subset:O', axis=alt.Axis(title='Subset')), alt.Y('sum(sha256):Q', axis=alt.Axis(title='Number of samples')), alt.Color('label:N', scale=alt.Scale(range=["#00b300", "#3333ff", "#ff3333"]), legend=alt.Legend(values=["unlabeled", "benign", "malicious"])) ) error message: `KeyError Traceback (most recent call last)

in 1 plotdf = emberdf.copy() ----> 2 gbdf = plotdf.groupby(["label", "subset"]).count().reset_index() 3 alt.Chart(gbdf).mark_bar().encode( 4 alt.X('subset:O', axis=alt.Axis(title='Subset')), 5 alt.Y('sum(sha256):Q', axis=alt.Axis(title='Number of samples')), ~/miniconda3/envs/ember/lib/python3.7/site-packages/pandas/core/frame.py in groupby(self, by, axis, level, as_index, sort, group_keys, squeeze, observed, dropna) 6523 squeeze=squeeze, 6524 observed=observed, -> 6525 dropna=dropna, 6526 ) 6527 ~/miniconda3/envs/ember/lib/python3.7/site-packages/pandas/core/groupby/groupby.py in __init__(self, obj, keys, axis, level, grouper, exclusions, selection, as_index, sort, group_keys, squeeze, observed, mutated, dropna) 531 observed=observed, 532 mutated=self.mutated, --> 533 dropna=self.dropna, 534 ) 535 ~/miniconda3/envs/ember/lib/python3.7/site-packages/pandas/core/groupby/grouper.py in get_grouper(obj, key, axis, level, sort, observed, mutated, validate, dropna) 784 in_axis, name, level, gpr = False, None, gpr, None 785 else: --> 786 raise KeyError(gpr) 787 elif isinstance(gpr, Grouper) and gpr.key is not None: 788 # Add key to exclusions KeyError: 'subset' `
mxj-aoyun commented 3 years ago

@kou18n I have the same question ,Have you solved it?

kou18n commented 3 years ago

@mxj-aoyun Not yet, I think the problem is the metadata.csv file has no subset column.

vietvo89 commented 3 years ago

I got the same issue and when I printed "emberdf", there is no column of "subset". So I cannot reproduce the chart as shown in the authors' report. I am looking for an answer too.

mrphilroth commented 3 years ago

I just pushed a change that will fix this. Tested on ember2017 and ember2018. Make sure you regenerate the metadata csv files after pulling in the new code. My newer versions of Altair didn't display the plots nicely in my old notebooks. Hopefully, you're able to save the charts or display them with an older version of Altair now that the metadata files have all the information in them.