Matteo21Q / jomo

R package for Joint Modelling Imputation
3 stars 0 forks source link

Segfault error from jomo2com.MCMCchain() and question about how to improve efficiency #4

Open katherinegiorgio opened 4 months ago

katherinegiorgio commented 4 months ago

Hello,

I am using the UK Biobank phenotype data for my research. We have ~30 variables that we’ve pulled from the 502,269 subjects in the study. We want to use jomo to impute missing values, clustering by person, across the 4 visits. I am running into problems with the amount of time it is taking. These are the steps we've tried taking to shorten the amount of time it takes. Is there anything else we could try? This is my function call: jomo(Y = y_1, Y2 = y_2, X = x_1, X2 = x_2, clus = clus_data, nimp = 1)

1) Running each of the 25 imputations separately - We are running the script 25 times simultaneously to obtain 25 imputed datasets. 2) Using the jomo2com.MCMCchain() function instead of jomo() since we only need one imputed dataset from each instance - This did not work for me. (Thank you for the very helpful message that directed me to this function when I ran jomo() with nimp=1 !) It ran fine when I used a test dataset of a smaller chunk of the same data, but when I run it with the full dataset, I get this error regardless of the amount of memory I allocate to the job (tested up to 1000GB, seems to be actually using 133GB):

*** caught segfault ***
address 0x7f229e1a9ca8, cause 'memory not mapped'

Traceback:
 1: jomo2com.MCMCchain(Y.con = y_1_con, Y.cat = y_1_cat, Y.numcat = y_1_cat_levels,     Y2.con = NULL, Y2.cat = y_2_cat, Y2.numcat = y_2_cat_levels,     X = x_1, X2 = x_2, clus = clus_data)
An irrecoverable exception occurred. R is aborting now ...
/var/spool/slurmd/job162274625/slurm_script: line 23: 1319249 Segmentation fault      Rscript --no-save stage3_imputation_240228.R < UKBB_harmonized_long_240228.Rda

Thank you!

Matteo21Q commented 4 months ago

Hi Katherine,

thanks for getting in touch!

Yes, unfortunately the .MCMCchain functions collect a lot of info (pretty much the whole chains) so when the data set is moderately large, it can crash because of segfault error.

One way around it is: use the standard jomo function with nimp=2 and half of the iterations you wanted to have as nburn and the other half as nbetween. Then just take the second imputation from each run of jomo, discarding the first. A bit more data handling to do but practically you should get same result. Hopefully this should work as the jomo functions don't keep all info on chains.

Hope this helps, Matteo


From: Katherine Giorgio @.> Sent: Wednesday, May 1, 2024 16:55 To: Matteo21Q/jomo @.> Cc: Subscribed @.***> Subject: [Matteo21Q/jomo] Segfault error from jomo2com.MCMCchain() and question about how to improve efficiency (Issue #4)

⚠ Caution: External sender

Hello,

I am using the UK Biobank phenotype data for my research. We have ~30 variables that we’ve pulled from the 502,269 subjects in the study. We want to use jomo to impute missing values, clustering by person, across the 4 visits. I am running into problems with the amount of time it is taking. These are the steps we've tried taking to shorten the amount of time it takes. Is there anything else we could try? This is my function call: jomo(Y = y_1, Y2 = y_2, X = x_1, X2 = x_2, clus = clus_data, nimp = 1)

  1. Running each of the 25 imputations separately - We are running the script 25 times simultaneously to obtain 25 imputed datasets.
  2. Using the jomo2com.MCMCchain() function instead of jomo() since we only need one imputed dataset from each instance - This did not work for me. (Thank you for the very helpful message that directed me to this function when I ran jomo() with nimp=1 !) It ran fine when I used a test dataset of a smaller chunk of the same data, but when I run it with the full dataset, I get this error regardless of the amount of memory I allocate to the job (tested up to 1000GB, seems to be actually using 133GB):

caught segfault address 0x7f229e1a9ca8, cause 'memory not mapped'

Traceback: 1: jomo2com.MCMCchain(Y.con = y_1_con, Y.cat = y_1_cat, Y.numcat = y_1_cat_levels, Y2.con = NULL, Y2.cat = y_2_cat, Y2.numcat = y_2_cat_levels, X = x_1, X2 = x_2, clus = clus_data) An irrecoverable exception occurred. R is aborting now ... /var/spool/slurmd/job162274625/slurm_script: line 23: 1319249 Segmentation fault Rscript --no-save stage3_imputation_240228.R < UKBB_harmonized_long_240228.Rda

Thank you!

— Reply to this email directly, view it on GitHubhttps://github.com/Matteo21Q/jomo/issues/4, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AJSP74VZCA33XBU2SH36QC3ZAEF5LAVCNFSM6AAAAABHCFJPCKVHI2DSMVQWIX3LMV43ASLTON2WKOZSGI3TGNZTG43DGMI. You are receiving this because you are subscribed to this thread.Message ID: @.***>

katherinegiorgio commented 4 months ago

Hi Matteo,

Thanks for your quick response! And for the explanation on why I was getting that error in the .MCMCchain function. This sounds like a good idea, thanks for this suggestion!

Best, Katherine