diffix / syndiffix

Python implementation of the SynDiffix synthetic data generation mechanism.
Other
4 stars 1 forks source link

Crash in microdata.py, _map_interval() #137

Open yoid2000 opened 4 months ago

yoid2000 commented 4 months ago

We are getting a crash at line 135 in the following:

https://github.com/diffix/syndiffix/blob/4dcdbfa375d7c8bb707cf17335a8741845a60c80/syndiffix/microdata.py#L127-L142

At the crash, we have min_value=1 and max_value=0. This is not a valid setting for rng.randint().

We also have:

interval.max = 1.25
interval.min = 1.0
value_map = ['a', 'b', 'c', 'z']

where a, b, c, and z are the set of values in the column.

The problem is that interval.max gets rounded down to 1.0 and the 1 is subtracted from that.

@cristianberneanu any ideas what might be going on here?

cristianberneanu commented 4 months ago

The interval sizes for string columns should never be smaller than 1 (splitting should stop when the current node is a singularity). That is the cause of the crash. No ideea why that happens though. Can you determine the source node for this bucket?

yoid2000 commented 4 months ago

Ok I'll dig in

PF

On Fri, Apr 12, 2024, 10:47 Cristian Berneanu @.***> wrote:

The interval sizes for string columns should never be smaller than 1 (splitting should stop when the current node is a singularity). That is the cause of the crash. No ideea why that happens though. Can you determine the source node for this bucket?

— Reply to this email directly, view it on GitHub https://github.com/diffix/syndiffix/issues/137#issuecomment-2051313017, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAQP5KO6X5VHFHS6M4ODKXDY46NRPAVCNFSM6AAAAABGC4X336VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDANJRGMYTGMBRG4 . You are receiving this because you authored the thread.Message ID: @.***>