Closed rmflight closed 6 months ago
What are the x and y axes of this plot?
On Thu, Apr 11, 2024 at 9:31 AM Robert M Flight @.***> wrote:
In addition to the new test of cause of missingness, it might also be really helpful to visualize the missingness patterns across samples using the naniar package (which shows location and percent missingness).
However, this potentially is more powerful if the items are ordered in some way with respect to the median value in each sample. But we can't re-order each sample, or we lose sense of things that are missing in common across samples.
What if we calculate the median rank of the feature across samples, and then reorder them by the median rank, and then visualize them? This should help inform whether ICI-Kt is appropriate, or if something else might be better.
— Reply to this email directly, view it on GitHub https://github.com/MoseleyBioinformaticsLab/ICIKendallTau/issues/19, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADEP7BY4DHELVQK7CLD72VTY42GENAVCNFSM6AAAAABGCKEWKCVHI2DSMVQWIX3LMV43ASLTON2WKOZSGIZTONZTG4ZTSNQ . You are receiving this because you are subscribed to this thread.Message ID: @.***>
Email: @. (work) @. (personal) Phone: 859-218-2964 (office) 859-218-2965 (lab) 859-257-7715 (fax) Web: http://bioinformatics.cesb.uky.edu/ Address: CC434 Roach Building, 800 Rose Street, Lexington, KY 40536-0093
Sorry Hunter, I should have given an example.
The x axis is the samples, the y is the features, and then colored by whether they are missing or not. It's essentially an overview of all of the values in the dataset.
Here is a fake one I did for the testing-left-censorship vignette in the package. This one I purposely started with ordered data, then added a little bit of noise to the values for replicates, and then introduced the majority of the missingness in the lower order features to force them to be below the median. This basically acts like a visual representation of the missingness in the data. If it's ordered by rank of the feature, then it's almost a visual of the binomial test.
There are only 100 missing values in this example, with 80 of them below the median.
trials success class
1 1900 1520 A
$binomial_test
Exact binomial test
data: total_success and total_trials
number of successes = 1520, number of trials = 1900, p-value < 2.2e-16
alternative hypothesis: true probability of success is greater than 0.5
95 percent confidence interval:
0.7843033 1.0000000
sample estimates:
probability of success
0.8
Is the example showing features ordered by median normalized rank across samples? The zero should be the lowest rank, which means the x-axis order should be reversed.
On Thu, Apr 11, 2024 at 1:11 PM Robert M Flight @.***> wrote:
Sorry Hunter, I should have given an example.
The x axis is the samples, the y is the features, and then colored by whether they are missing or not. It's essentially an overview of all of the values in the dataset.
Here is a fake one I did for the testing-left-censorship vignette in the package. This one I purposely started with ordered data, then added a little bit of noise to the values for replicates, and then introduced the majority of the missingness in the lower order features to force them to be below the median. This basically acts like a visual representation of the missingness in the data. If it's ordered by rank of the feature, then it's almost a visual of the binomial test.
There are only 100 missing values in this example, with 80 of them below the median.
examine-missingness-1.png (view on web) https://github.com/MoseleyBioinformaticsLab/ICIKendallTau/assets/1509626/8652934e-d5c2-407f-9d61-cf88ebf77389
trials success class 1 1900 1520 A
$binomial_test
Exact binomial test
data: total_success and total_trials number of successes = 1520, number of trials = 1900, p-value < 2.2e-16 alternative hypothesis: true probability of success is greater than 0.5 95 percent confidence interval: 0.7843033 1.0000000 sample estimates: probability of success 0.8
— Reply to this email directly, view it on GitHub https://github.com/MoseleyBioinformaticsLab/ICIKendallTau/issues/19#issuecomment-2050142588, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADEP7B5FV6YXECH4IQVIVOTY4275HAVCNFSM6AAAAABGCKEWKCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDANJQGE2DENJYHA . You are receiving this because you commented.Message ID: @.***>
Email: @. (work) @. (personal) Phone: 859-218-2964 (office) 859-218-2965 (lab) 859-257-7715 (fax) Web: http://bioinformatics.cesb.uky.edu/ Address: CC434 Roach Building, 800 Rose Street, Lexington, KY 40536-0093
OK, after our discussion, and working with a real dataset (yeast from Barton), here are the two missing data plots:
No ordering:
Rank ordering:
The second graph is just rank order of the features. Correct?
On Thu, Apr 11, 2024 at 7:32 PM Robert M Flight @.***> wrote:
OK, after our discussion, and working with a real dataset (yeast from Barton), here are the two missing data plots:
No ordering: yeast_unordered.png (view on web) https://github.com/MoseleyBioinformaticsLab/ICIKendallTau/assets/1509626/fdc10fc4-6e68-422e-a5a7-1aa599f60b5b
Rank ordering: yeast_rank_order.png (view on web) https://github.com/MoseleyBioinformaticsLab/ICIKendallTau/assets/1509626/93a73fda-c063-4163-af70-d0ea53c86b3d
— Reply to this email directly, view it on GitHub https://github.com/MoseleyBioinformaticsLab/ICIKendallTau/issues/19#issuecomment-2050715320, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADEP7B3GYWFLIDUEVJT5IBDY44MQJAVCNFSM6AAAAABGCKEWKCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDANJQG4YTKMZSGA . You are receiving this because you commented.Message ID: @.***>
Email: @. (work) @. (personal) Phone: 859-218-2964 (office) 859-218-2965 (lab) 859-257-7715 (fax) Web: http://bioinformatics.cesb.uky.edu/ Address: CC434 Roach Building, 800 Rose Street, Lexington, KY 40536-0093
Rank order of features, and then the samples are ordered by percentage missing.
Final addition to this function, it now also spits out the median rank and number of missing entries for each feature (row), so we can easily create a plot like this one, where we can see that the median rank is directly a function of the number of missing entries! This is for the yeast dataset.
Very nice graph and functionality!
On Fri, Apr 12, 2024 at 10:37 AM Robert M Flight @.***> wrote:
Final addition to this function, it now also spits out the median rank and number of missing entries for each feature (row), so we can easily create a plot like this one, where we can see that the median rank is directly a function of the number of missing entries! This is for the yeast dataset.
fig-yeast-nna-1.png (view on web) https://github.com/MoseleyBioinformaticsLab/ICIKendallTau/assets/1509626/daf0bb9f-771e-4931-917a-771a5885bb2b
— Reply to this email directly, view it on GitHub https://github.com/MoseleyBioinformaticsLab/ICIKendallTau/issues/19#issuecomment-2051886293, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADEP7B2SZBKHGHTDZAWJTZ3Y47WT3AVCNFSM6AAAAABGCKEWKCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDANJRHA4DMMRZGM . You are receiving this because you commented.Message ID: @.***>
Email: @. (work) @. (personal) Phone: 859-218-2964 (office) 859-218-2965 (lab) 859-257-7715 (fax) Web: http://bioinformatics.cesb.uky.edu/ Address: CC434 Roach Building, 800 Rose Street, Lexington, KY 40536-0093
In addition to the new test of cause of missingness, it might also be really helpful to visualize the missingness patterns across samples using the
naniar
package (which shows location and percent missingness).However, this potentially is more powerful if the items are ordered in some way with respect to the median value in each sample. But we can't re-order each sample, or we lose sense of things that are missing in common across samples.
What if we calculate the median rank of the feature across samples, and then reorder them by the median rank, and then visualize them? This should help inform whether ICI-Kt is appropriate, or if something else might be better.