Thie1e / cutpointr

Optimal cutpoints in R: determining and validating optimal cutpoints in binary classification
https://cran.r-project.org/package=cutpointr
85 stars 13 forks source link

Odd behevior in RStudio #66

Open PrionMike opened 1 month ago

PrionMike commented 1 month ago

Dear Christian, I am having a strange experience using cutpointr 1.1.2. In case it is relevant, I am working in RStudio 2024.04.2 Build 764, running R 4.4.1 on WIndows 10 version 22H2, build 19045.4894. I run a straightforward version of the cutpointr function, with Youden Index maximization on a continuous x variable (mkr) and binary (1/0) class variable (Dx), with bootstrapping. It does not seem to matter that I am asking for 1000 bootstrap iterations, because I can get the same strange experience with 100 iterations.

bcutYI_mkr <- cutpointr( data = data, x = mkr, class = Dx, method = maximize_metric, metric = youden, boot_runs = 1000 ) which seems to run fine. However, when I call the result variable bcutYI_mkr by name to examine the results, or call summary(), the display takes about a full minute to update. I know R is still running, because the cursor is still blinking throughout the wait period, but while I am waiting for the console to update RStudio is unresponsive. Another incidental observation is that in the Environment pane the variable bcutYI-mkr shows a size of 0 B, even though it must exist because I can eventually display it. Any thoughts?

Thie1e commented 1 month ago

Hi Mike,

using slightly different versions of R and RStudio I don't observe that phenomenon, at least not when running cutpointr with the example data.

Maybe it has something to do with your data. What does str(data) look like? Can you share that dataset or a sample from it?

Also, I can see the object size of the result from cutpointr in RStudio being >0. Are other objects in your environment shown with the correct object sizes and just the one from cutpointr causes problems?

PrionMike commented 1 month ago

Thanks Christian, and it's nice to hear from you! I have attached a data file that on my system reliably reproduces the problem. Here too is the code I am using:

system.time( tbcut <- cutpointr( data = tftq2k, x = TAU, class = DXA, method = maximize_metric, metric = youden, boot_runs = 100 ) )

tbcut summary( tbcut )

All the best, Mike!

On Fri, Sep 20, 2024 at 4:57 AM Christian Thiele @.***> wrote:

Hi Mike,

using slightly different versions of R and RStudio I don't observe that phenomenon, at least not when running cutpointr with the example data.

Maybe it has something to do with your data. What does str(data) look like? Can you share that dataset or a sample from it?

Also, I can see the object size of the result from cutpointr in RStudio being >0. Are other objects in your environment shown with the correct object sizes and just the one from cutpointr causes problems?

— Reply to this email directly, view it on GitHub https://github.com/Thie1e/cutpointr/issues/66#issuecomment-2363215412, or unsubscribe https://github.com/notifications/unsubscribe-auth/BA7GQHUCM55KXUDYEPQKSN3ZXPPN7AVCNFSM6AAAAABOMEKQFWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGNRTGIYTKNBRGI . You are receiving this because you authored the thread.Message ID: @.***>

Thie1e commented 1 month ago

Hi Mike,

apparently, if you reply to the email directly, the attachment does not show up on Github. Can you try uploading the file using the Github UI? If the data is small, you can also paste the output from dput().

PrionMike commented 1 month ago

Hi Christian. Yes, this is what I assumed would happen when I sent you the data as an attachment. The data I shared with you is from real patients, is unpublished, and technically belongs to the Government of Canada, so I was reluctant to post it to a public forum even though it has been anonymized. Is this still OK? Does it violate the terms of use for GitHub? Best, Mike

On Mon, Sep 23, 2024 at 12:55 PM Christian Thiele @.***> wrote:

Hi Mike,

apparently, if you reply to the email directly, the attachment does not show up on Github. Can you try uploading the file using the Github UI? If the data is small, you can also paste the output from dput().

— Reply to this email directly, view it on GitHub https://github.com/Thie1e/cutpointr/issues/66#issuecomment-2368843198, or unsubscribe https://github.com/notifications/unsubscribe-auth/BA7GQHWHBN5OEOYSISNNHB3ZYBBYFAVCNFSM6AAAAABOMEKQFWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGNRYHA2DGMJZHA . You are receiving this because you authored the thread.Message ID: @.***>

Thie1e commented 1 month ago

Hi Mike,

I see, would it be OK for you to send that data to my work e-mail? If yes, that's christian.thiele[at]hsbi[dot]de.

PrionMike commented 1 month ago

No problem Christian -- here it is!

As you probably already realize, there are two markers in the dataset -- "TAU" and "F3E". These are both protein markers that are frequently assayed in cerebrospinal fluid from patients suspected of having Creutzfeldt-Jakob disease. "DXA" is the variable for diagnostic status -- 1 for definite or probable CJD, and 0 for definite or probable non-CJD. Many thanks for your help!

If you still have any time to spend after you have a look at this apparent technical issue, I would like to ask your opinion on another topic related to another posting by someone else on your GitHub site, regarding defining an intermediate zone of uncertainty around a point estimate of optimal cutoff. I am very interested in this question as well, and have been working with Hans Landsheer's package UncertainInterval to try to better understand our data in these terms. Mike

All the best, Mike

On Tue, Sep 24, 2024 at 8:11 AM Christian Thiele @.***> wrote:

Hi Mike,

I see, would it be OK for you to send that data to my work e-mail? If yes, that's christian.thiele[at]hsbi[dot]de.

— Reply to this email directly, view it on GitHub https://github.com/Thie1e/cutpointr/issues/66#issuecomment-2371093665, or unsubscribe https://github.com/notifications/unsubscribe-auth/BA7GQHTKPL5IDGVBKWAFIXDZYFJGFAVCNFSM6AAAAABOMEKQFWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGNZRGA4TGNRWGU . You are receiving this because you authored the thread.Message ID: @.***>

Thie1e commented 1 month ago

Hi Mike,

thanks for submitting the data. First, regarding the runtime, I don't experience any issues. Your code above finishes in a few seconds.

Maybe you could try updating your R packages if you have not done so already. If you do, could you post the result of sessionInfo() before and after updating the packages? Perhaps one of the dependencies causes the issue.

Best, Christian

PrionMike commented 1 month ago

Thanks Christian, for taking the time to look at this issue for me. You must be able to tell that I am an amateur at best with R and RStudio. I typically load other packages -- perhaps too many! -- in the RStudio sessions where I am using cutpointr, and when you said you were unable to reproduce the problem it suggested to me that there must be something about my session environment that was causing it. I don't know if there was a cause-effect relationship, but I decided to open a session and just load cutpointr, and the problem with delayed display of results did not occur. I could not evoke the problem even when I then loaded the other packages into the session. So, even though I don't yet know the original cause, you helped me solve the problem. :) Thanks again, so much! -- Mike