HimesGroup / qmtools

Tools for quantitative metabolomics data processing
1 stars 1 forks source link

Performing one-way ANOVA with compareSamples() #1

Closed silasmellor closed 1 month ago

silasmellor commented 1 month ago

Hi, i’ve using qmtools for a while and think you’ve done a great job in making a package that provides a flexible workflow for analysing quantitative metabolomics data. I’m currently working on a dataset (a time series experiment looking at metabolites changing during plant flower development) where i would like to perform a one-way ANOVA to identify metabolites that vary significantly over the time series. The idea is to filter the dataset by ANOVA significance before clustering metabolites to look for different patterns. I wonder if you could help me figure out how to approach this using compareSamples(). Can i specify a design (e.g ~time) directly within that function and get F statistics and p-values, or would i have to extraxt the relevant matrix from the SE and use limma on that?

Thanks again for a very useful package! Best, Silas

jaehyunjoo commented 1 month ago

Hi Silas,

While I don't know the details of your experiment, it sounds like you have measured the same samples at multiple time points. If that's the case, it's important to consider individual variability that could influence the results since observations from the same sample aren't independent.

I'd suggest the latter: Retrieve the relevant expression matrix from SE and then use a package that aligns with your needs. For example, if you're using limma, it's worth checking out the duplicateCorrelation function and case studies in the limma user's guide. Currently, compareSamples lacks the flexibility for addressing this.

All the best, Jaehyun

silasmellor commented 1 month ago

Hi Jaehyun, Thank you for the feedback, you raise a very good point i had not considered. In fact the way we sampled fornthis experiment makes me a little unsure as to whether these should indeed ve considered repeat samples of the same individuals, but i’d ve curious to hear what you think.

As i mentioned, the time series follows flower development, but the samples were all collected at the same time. We had 4 pools of 2 plants each as biological replicates. From these we collected all flowers in we could reliable determine as belonging to each of 5 defined developmental stages from bud to open flower. As one plant (petunia in this case) produces flowers continuously a plant in full bloom will typically have flowers of all developmental stages at any time. We therefore collected all samples simultaneously, rather than performing repeat sampling as such. Though given the flowers for each replicate derive from the same 2 individual plants i suppose this would still be considered dependent? Best, Silas

jaehyunjoo commented 1 month ago

I am not sure I understood correctly, but you have 4 biological replicates of petunia? If you believe your flower samples are affected by inherent differences between the biological replicates, you need to consider that. However, the situation may vary depending on your specific experiment. Consulting with a statistician in your organization would be the best way to determine the most appropriate approach for your data. I apologize that I couldn't be more helpful.

Sincerely, Jaehyun