Closed marcora closed 1 year ago
@marcora You are absolutely correct. lipidr uses limma for statistical testing, which expects log-transformed data to be normally distributed (more accurately negative binomial distribution). In general this assumption fits peak areas measured by MS.
You can always examine the distribution of your measurements, using ggplot2::geom_density
before passing it to lipidr, or use plot_samples(type="boxplot")
. If your original data is normally distributed, you can tell lipidr to skip log transformation using as_lipidomics_experiment(logged=TRUE)
.
So what if the lipid values are given by the lipidomic facility as Mol or molar fraction %Mol. I understand i can inspect the distribution but without large sample size and for newbies like me it would be important to provide guidance on how to use lipidr with different kind of inputs and what transformation/normalization to use with each of them.
Just asking to put "lipid" values in the input matrix is the perfect recipe for people inputting whatever values they are given and applying default parameters at risk of getting wrong inferences and inflated pvalues out of it.
That's how statistical tools are used by majority of experimental biologists with poor training in statistical modeling and one of the major causes of irreproducible research.
A bit of documentation in the package regarding the various types of lipid measurements that are more often used, along side guidance on how to analyze them in lipidr would be very helpful in preventing wrong usage of the tool.
Thanks @marcora. Without seeing the data or knowing the generative model / how it was preprocessing, there's no way I can make assumptions on the data distribution. Unfortunately, I've only dealt with peak areas so far, which usually follow a normal(-like) distribution when log-transformed. If you think the documentation need improvement regarding types of lipid measurements, I'm happy to review PR if you can contribute, ideally a vignette with public data (similar to examples on lipidr website https://www.lipidr.org/).
That would be great! It's my first time using lipidr and lipidomic data and therefore it may take me some time, buy I will get back to you here.
With only a small number of samples it's hard to estimate distribution though and was hoping to tap into prior experience.
The lipid numeric values I have are estimates (based on spiked-in standards) of lipid molar concentration (moles of lipid/L of final extract that was analyzed). They also provide me with molar fraction values (Mol%) normalized within each sample.
Which of the two numeric values (Mol or Mol%) I should use? If Mol is the answer, should I log and normalize the values within lipidr?
Thanks
I'd go with Mol, since it's just a scaled peak area, and most likely will follow the same distribution. In this case, I'd log transform it and skip normalization (since it's already normalized?).
As discussed above, if you want to use Mol%, you'd need to somehow determine whether they are normally distributed with/without log-transformation.
Understood! At the end I was able to obtain peak areas from the lipidomic facility I used for my study.
It is not clear from the documentation what kind of numeric measurements as_lipidomics_experiment expect/assumes. Is it peak area, [Mol], molar fraction [%Mol]?
Surely the distributional properties of the statistical tests may change depending on the type of measurement (e.g., raw vs %).