PSIAIMS / CAMIS

https://psiaims.github.io/CAMIS/
Apache License 2.0
61 stars 60 forks source link

heart dataset #351

Closed jagadishkatam closed 3 weeks ago

jagadishkatam commented 4 weeks ago

heart dataset for use with chi square test

statasaurus commented 4 weeks ago

Hello, Thank you for your pull request. If you are going to add a python example for chi square it would be great if you could use the lung cancer dataset

jagadishkatam commented 3 weeks ago

Hi Christina,

Thank you for your reply,

I compared the chi-squared test results between SAS and R using the sashelp.heart dataset for the status and sex categorical variables. While all the statistics matched perfectly, except for Mantel-Haenszel chi-square results. I referred to the CAMIS documentation here: CAMIS: R and SAS Chi-Squared Comparison (R/SAS Chi-Squared and Fisher’s Exact Comparision https://psiaims.github.io/CAMIS/Comp/r-sas_chi-sq.html).

In my analysis, I used the PROC FREQ procedure in SAS with the exact statement, and I observed that the chi-squared results matched between R and SAS, if I use correct = FALSE, it matches with chi-square results and if I don't use correct = FALSE argument in chisq.test() in R, it matches with the continuity adjusted chi-square results. For Fisher’s exact test, where I calculated the p-values and odds ratios (including the 95% confidence interval), my results matched between R and SAS, which contradicts with the findings stated on the CAMIS website.

image

image

image

image

One key difference I noticed is that in SAS, we get all the necessary statistics, including chi-squared, Fisher’s exact test, odds ratio, and more, using a single procedure (PROC FREQ). However, in R, we need to use different functions such as chisq.test(), GTest(), fisher.test(), Phi(), ContCoef(), and CramerV() to get similar statistics.

Another aspect that stood out is that while I was able to get Mantel-Haenszel chi-square results in SAS with a 2-dimensional array, in R, I encountered an error saying that the data must be a 3-dimensional array for mantelhaen.test(). It's strange that SAS can handle this with a 2D array, and I wonder if there’s any reason behind this difference.

I thought, these observations if agreed upon, could be detailed in the R/SAS Chi-Squared and Fisher’s Exact Comparision Chi-Squared Test documentation (R/SAS Chi-Squared and Fisher’s Exact Comparision https://psiaims.github.io/CAMIS/Comp/r-sas_chi-sq.html)

I'd really appreciate any inputs or insights on these observations, and please excuse my ignorance if I missed any details in my analysis.

Thanks & Regards, Jagadish Katam,

On Tue, Oct 29, 2024 at 10:48 AM Christina Fillmore < @.***> wrote:

Hello, Thank you for your pull request. If you are going to add a python example for chi square it would be great if you could use the lung cancer dataset

— Reply to this email directly, view it on GitHub https://github.com/PSIAIMS/CAMIS/pull/351#issuecomment-2444487052, or unsubscribe https://github.com/notifications/unsubscribe-auth/AUAGTPAF4NBDKFVGORDCJK3Z56N2RAVCNFSM6AAAAABQW765AGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDINBUGQ4DOMBVGI . You are receiving this because you authored the thread.Message ID: @.***>

statasaurus commented 3 weeks ago

Hello Jagadish, Thank you for your response, this context is very helpful. To your point about the Cochran-mantel haenszel information being miss, we actually store that information in a separate section feel free to check that out. If you have any comments please either add them to an issue or make a new pull request.

For the question around the correct = TRUE, there is a section in the comparison file that talks a bit about this, but I agree with you, it isn't very obvious so it would probably be good to improve that a bit.

For the issue around the odd-ratio, our sample data does appear to show a bit of a difference with R having an odds-ratio of 1.630576 and SAS having an odds-ratio of 1.6345. That being said it doesn't go into great detail about why this is so it would be great if you wanted to add some additional context to this.

Thank you, CAMIS team

jagadishkatam commented 1 day ago

Hi Christina,

I’ve sent a pull request based on our earlier discussion. Could you kindly review it? If it looks good, please merge it; otherwise, I’d appreciate any feedback or suggestions for improvement.

Thanks & Regards, Jagadish Katam,

On Mon, Nov 4, 2024 at 6:40 AM Christina Fillmore @.***> wrote:

Hello Jagadish, Thank you for your response, this context is very helpful. To your point about the Cochran-mantel haenszel information being miss, we actually store that information in a separate section https://psiaims.github.io/CAMIS/Comp/r-sas_cmh.html feel free to check that out. If you have any comments please either add them to an issue or make a new pull request.

For the question around the correct = TRUE, there is a section in the comparison file that talks a bit about this, but I agree with you, it isn't very obvious so it would probably be good to improve that a bit.

For the issue around the odd-ratio, our sample data does appear to show a bit of a difference with R having an odds-ratio of 1.630576 and SAS having an odds-ratio of 1.6345. That being said it doesn't go into great detail about why this is so it would be great if you wanted to add some additional context to this.

Thank you, CAMIS team

— Reply to this email directly, view it on GitHub https://github.com/PSIAIMS/CAMIS/pull/351#issuecomment-2454489097, or unsubscribe https://github.com/notifications/unsubscribe-auth/AUAGTPFURF4S7KWHEX7SNI3Z65MJNAVCNFSM6AAAAABQW765AGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDINJUGQ4DSMBZG4 . You are receiving this because you authored the thread.Message ID: @.***>