ahmohamed / lipidr

Data Mining and Analysis of Lipidomics datasets in R
https://www.lipidr.org/
Other
27 stars 13 forks source link

Lipidr query #25

Closed shubham7193 closed 2 years ago

shubham7193 commented 2 years ago

Hi Ahmed, First of all, thanks for creating such a powerful analysis tool. I am a postdoc at Optometry, UC Berkeley and our lab routinely performs lipidomics for detecting PUFAs and their metabolites like Lipoxins, resolvins, and maresins, etc using MRM. Currently, we use Analyst software for the peak integration and use peak area to calculate concentrations based on standards in Microsoft excel. We wanted to incorporate the PCA analysis for the larger datasets and analyze the data without any bias. That's where your tool is very promising. I wanted to ask you a few doubts:

  1. Do we need to convert all the lipid names to the format mentioned CLS xx:y since some of the LTB4 FA 20:4;O2 is according to the format.
  2. Since our analysis is targeted and focussed on one classification, is it possible to analyze our data having different groups and concentration treatments for different tissues?
  3. Since we use Analyst software for the peak integration and are not experienced in using Skyline, is it possible to create input files in the correct format because the Analyst creates one parameter (Area, RT, etc) at a time? Do we need to give multiple files for Area, Height, RT to lipidr? Please do let me know so that I can proceed with using lipidr and our analysis. Thanks in advance for your help. Regards, Shubham
ahmohamed commented 2 years ago

Hi @shubham7193, Thanks for your interest in lipidr. Please see answers below

  1. Yes you need to change the names to one that lipidr can read. Since you're doing a targetted method, this wouldn't be too annoying since you need to do it once, by changing the Molecule name in the original LC/MS method, or writing a reusable preprocessing script to change the names after you export it from Analyst. In general, CLS should be alphanumeric, followed by space then xx:yy describing the chain info (up to 4 chains xx:yy/xx:yy and so on). If you need to keep additional annotations, you can add them in parenthesis at the end and lipidr will keep them. So a compatible version of LTB4 FA 20:4;O2 would be LTB4_FA 20:4 (O2). A regex script would be:
mol = "LTB4 FA 20:4;O2"
mol = sub(" FA", "FA", mol)
mol = sub(";(.*)$", "(\\1)", mol)
  1. Yes you can, but that will require familiarity with design matrices. lipidr uses limma under the hood, and you can directly provide your design matrix to lipidr::de_design for such complex comparisons. Please refer to limma user guide for a comprehensive description on how to construct design matrices here, chapter 9. Otherwise, I'm happy to collaborate if you need a hand with that.

  2. You can have a look at the csv Skyline exports and try to mimic something similar to it. Alternative, you can provide multiple files and them concatenate them in the end. See below:

area_df = read.csv("area_file.csv")
rt_df = read.csv("rt_file.csv")

data = as_lipidomics_experiment(area_df)
rt_data = as_lipidomics_experiment(rt_df) # by default, input matrix is considered Area

assay(data, "RT") <- assay(rt_data, "Area") # Extract the "Area" from `rt_data` and put it as "RT" in `data`

Cheers, Ahmed.

shubham7193 commented 2 years ago

Hi Ahmed, Thanks for the reply. I will work on it and let you know soon. I may require more help with as I am not very fluent in R. Regards, Shubham

shubham7193 commented 2 years ago

Hi Ahmed, I tried changing the names of the lipids and concatenate them as you mentioned above. But it seems to give me the error for not parsing the names correctly. lipids.csv

Data <- read.csv("lipids.csv")
Data$Classnew <- paste(Data$Lipids, Data$Class, sep = "")
Data <- Data[,c(9,3,4,5,6)]
d <- as_lipidomics_experiment(Data)

If I am not putting the name upfront like "LTB4_FA 20:4 (O2)" it shows duplicated lipids as many of them as same shorthand names. Can lipidr parse "LTB4_FA 20:4 (O2)" as a name of lipid? Please help with this.