bnowok / synthpop

Generating Synthetic Versions of Sensitive Microdata for Statistical Disclosure Control
40 stars 8 forks source link

Synthpop freezes on dataset with factor attributes #20

Open danielamartinezd02 opened 2 years ago

danielamartinezd02 commented 2 years ago

The synthetization process never finishes just freezes for a dataset with factor attributes, when there are more than 2 classes. image My R version is 4.1.2 and the synthpop version is 1.7.0.

gillian-raab commented 2 years ago

Dear Daniel, I'm, afraid the code you have sent appears in an image that is unreadable. The same was true on github. Without seeing the details I can't see what is going on. but I'm sure it is NOT because of factors that have more than two factor, which are handled OK, though computational problems can happen with lots of levels (e.g. >15).

Best Gillian

Gillian M Raab

Emeritus Professor, Edinburgh Napier University

Part-time Research Fellow

Administrative Data Research Centre - Scotland

Edinburgh

+44 7748 678 551


From: danielamartinezd02 @.> Sent: 22 March 2022 10:08 To: bnowok/synthpop @.> Cc: Subscribed @.***> Subject: [bnowok/synthpop] Synthpop freeze on dataset with factor attributes (Issue #20)

This email was sent to you by someone outside the University. You should only click on links or attachments if you are certain that the email is genuine and the content is safe.

The synthetization process freezes for a dataset with factor attributes, when there are more than 2 classes. [image]https://user-images.githubusercontent.com/58200257/159455975-6a72d404-19bb-472c-adb3-4baf62066bdc.png My R version is 4.1.2 and the synthpop version is 1.7.0.

— Reply to this email directly, view it on GitHubhttps://github.com/bnowok/synthpop/issues/20, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AE3HB7GPQ2HXFRVD3U6NWIDVBGL3FANCNFSM5RKLUXXA. You are receiving this because you are subscribed to this thread.Message ID: @.***>

The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. Is e buidheann carthannais a th’ ann an Oilthigh Dhùn Èideann, clàraichte an Alba, àireamh clàraidh SC005336.

danielamartinezd02 commented 2 years ago

Dear Gilian,

There is one attribute with lots of levels where it gets stuck ('Country'). However I also deleted it, and it is not the only one getting stuck also for 3 levels I am having problems. I attach the dataset I am trying to synthesize. mental_health_train_data_all.csv and the line code that i am using is: synthlist <- syn(real_df, method = 'cart', visit.sequence = 1:ncol(real_df), k = nrow(real_df), seed = myseed) Best, Daniela

gillian-raab commented 2 years ago

I can have a look at some time later but I'm pretty busy just now. Meanwhile things you should try are

  1. changing ordering of synthesis
  2. Restricting predictor matrix to exclude variables with many levels
  3. Using method nested if you can define wider groups from your many-level variables.

This paper https://arxiv.org/pdf/1712.04078.pdf may give you some hints, although it is pretty old now.

BEST gILLIAN arXiv:1712.04078v1 [stat.AP] 12 Dec 2017https://arxiv.org/pdf/1712.04078.pdf 4 The sta member producing synthetic data can control the synthesis process in various ways, where the three main parameters are 1. Synthesis method(s) A di erent method can be speci ed for each variable. arxiv.org

Gillian M Raab

Emeritus Professor, Edinburgh Napier University

Part-time Research Fellow

Administrative Data Research Centre - Scotland

Edinburgh

+44 7748 678 551


From: danielamartinezd02 @.> Sent: 23 March 2022 07:58 To: bnowok/synthpop @.> Cc: RAAB Gillian @.>; Comment @.> Subject: Re: [bnowok/synthpop] Synthpop freezes on dataset with factor attributes (Issue #20)

This email was sent to you by someone outside the University. You should only click on links or attachments if you are certain that the email is genuine and the content is safe.

Dear Gilian,

There is one attribute with lots of levels where it gets stuck ('Country'). However I also deleted it, and it is not the only one getting stuck also for 3 levels I am having problems. I attach the dataset I am trying to synthesize. mental_health_train_data_all.csvhttps://github.com/bnowok/synthpop/files/8330951/mental_health_train_data_all.csv

Best, Daniela

— Reply to this email directly, view it on GitHubhttps://github.com/bnowok/synthpop/issues/20#issuecomment-1076042914, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AE3HB7AZ2EQJHO2HPBVOXITVBLFKPANCNFSM5RKLUXXA. You are receiving this because you commented.Message ID: @.***>

The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. Is e buidheann carthannais a th’ ann an Oilthigh Dhun Eideann, claraichte an Alba, aireamh claraidh SC005336.

danielamartinezd02 commented 2 years ago

Dear Gillian,

Thanks for your response. I will take a look to what you have suggested.

Best, Daniela

RebelOfDeath commented 11 months ago

hey @danielamartinezd02, were you able to solve this problem? Because I've been having the exact same problem with the UCI Adult's Census Dataset.

gillian-raab commented 11 months ago

Did you look at the paper I suggested in my reply on github?

If so and it did not help then perhaps send me more details of your problem.

Gillian

Gillian M Raab Research Fellow (part-time) Scottish Centre for Administrative Data Research My core working days are Tuesdays and Thursdays Though I sometimes swap them for other days 07748 678 551


From: Roham Koohestani @.> Sent: 30 November 2023 10:15 To: bnowok/synthpop @.> Cc: Gillian Raab @.>; Comment @.> Subject: Re: [bnowok/synthpop] Synthpop freezes on dataset with factor attributes (Issue #20)

This email was sent to you by someone outside the University. You should only click on links or attachments if you are certain that the email is genuine and the content is safe.

hey @danielamartinezd02https://github.com/danielamartinezd02, were you able to solve this problem? Because I've been having the exact same problem with the UCI Adult's Census Dataset.

— Reply to this email directly, view it on GitHubhttps://github.com/bnowok/synthpop/issues/20#issuecomment-1833467481, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AE3HB7ABNLIXKLNBSSJRND3YHBMFFAVCNFSM5RKLUXXKU5DIOJSWCZC7NNSXTN2JONZXKZKDN5WW2ZLOOQ5TCOBTGM2DMNZUHAYQ. You are receiving this because you commented.Message ID: @.***>

The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. Is e buidheann carthannais a th’ ann an Oilthigh Dhùn Èideann, clàraichte an Alba, àireamh clàraidh SC005336.