dice-group / vectograph

GNU General Public License v3.0
1 stars 2 forks source link

ValueError: Bin edges must be unique: array #10

Closed hzahera closed 2 years ago

hzahera commented 2 years ago

Hello Demir,

I am using Vectorgraph to construct a KG for the smart-logistic dataset. I got this error during the execution. Can you please hint how can I fix it?

Traceback (most recent call last):
  File "main.py", line 32, in <module>
    num_quantile=args.num_quantile).transform(df)
  File "/home/daikiri/DAIKIRI/src/Clustering/WP 3.3/Vectograph/vectograph/quantizer.py", line 105, in transform
    new_column_name, discretized, bin_values = self.__perform_discretization(column_name=col, df=df)
  File "/home/daikiri/DAIKIRI/src/Clustering/WP 3.3/Vectograph/vectograph/quantizer.py", line 84, in __perform_discretization
    duplicates=self.duplicates)
  File "/home/daikiri/.conda/envs/daikiri/lib/python3.6/site-packages/pandas/core/reshape/tile.py", line 348, in qcut
    duplicates=duplicates,
  File "/home/daikiri/.conda/envs/daikiri/lib/python3.6/site-packages/pandas/core/reshape/tile.py", line 381, in _bins_to_cuts
    f"Bin edges must be unique: {repr(bins)}.\n"
ValueError: Bin edges must be unique: array([2381605., 5800860., 5800860.]).
You can drop duplicate edges by setting the 'duplicates' kwarg
hzahera commented 2 years ago

When I set duplicates=drop, I got this error.

Traceback (most recent call last):
  File "smart-logistic-example.py", line 9, in <module>
    X_transformed = QCUT(min_unique_val_per_column=6, num_quantile=5).transform(smartLogistic_df)
  File "/home/daikiri/DAIKIRI/src/Clustering/WP 3.3/Vectograph/vectograph/quantizer.py", line 105, in transform
    new_column_name, discretized, bin_values = self.__perform_discretization(column_name=col, df=df)
  File "/home/daikiri/DAIKIRI/src/Clustering/WP 3.3/Vectograph/vectograph/quantizer.py", line 84, in __perform_discretization
    duplicates=self.duplicates)
  File "/home/daikiri/.conda/envs/daikiri/lib/python3.6/site-packages/pandas/core/reshape/tile.py", line 348, in qcut
    duplicates=duplicates,
  File "/home/daikiri/.conda/envs/daikiri/lib/python3.6/site-packages/pandas/core/reshape/tile.py", line 411, in _bins_to_cuts
    "Bin labels must be one fewer than the number of bin edges"
ValueError: Bin labels must be one fewer than the number of bin edges
Demirrr commented 2 years ago

Dear Hamada,

ValueError: Bin labels must be one fewer than the number of bin edges indicates that the input parameters of QCUT(min_unique_val_per_column=...,num_quantile=..) is not compatiable on your data.

To detect the issue, could you follow the next steps

  1. Could you tell me how many numbers of unique values per column on your data ?
  2. How did you set min_unique_val_per_column?