Closed czhu closed 1 year ago
I'm having a similar problem. My data uses a "Shorthand Notation" for "SwissLipids Name". Some of the lipids names can be processed by lipidr
but others cannot. Examples:
Cer 32:0;O2
(Ceramide (d32:0))HexCer 34:0;O2
(Hexosyl ceramide (d34:0))ST 27:1;O
(cholesterol)LPC O-14:0
(Phosphatidylcholine (O-14:0/0:0))LPE O-14:0
(Phosphatidylethanolamine (O-14:0/0:0))PC O-16:0/18:1
(Phosphatidylcholine (O-16:0/18:1))PE O-16:0/16:1
(Phosphatidylethanolamine (O-16:0/16:1))SM 30:0;O2
(Sphingomyelin (d30:0))Is there perhaps a source with which I can lookup the acceptable CLS xx:x/yy:y
code for a "SwissLipids Name". Or could you point out how to correct the codes used here?
Any help greatly appreciated, John Hendrickx
Sincere apologies to both of you for the late reply. See below how to reformat the examples you gave above. You can do it manually or using regex as below.
l = c("PE P-18:1/18:2", "TG 42:0-FA14:0", "SM d18:1/20:1", "Cer 32:0;O2", "HexCer 34:0;O2", "ST 27:1;O", "LPC O-14:0", "LPE O-14:0", "PC O-16:0/18:1", "PE O-16:0/16:1", "SM 30:0;O2")
l2 = sub(";(O\\d*)", "(\\1)", l)
l2 = sub(" O-", "O ", l2)
l2 = sub(" P-", "P ", l2)
l2 = sub("-FA", "/", l2)
l2
#> [1] "PEP 18:1/18:2" "TG 42:0/14:0" "SM d18:1/20:1" "Cer 32:0(O2)"
#> [5] "HexCer 34:0(O2)" "ST 27:1(O)" "LPCO 14:0" "LPEO 14:0"
#> [9] "PCO 16:0/18:1" "PEO 16:0/16:1" "SM 30:0(O2)"
lipidr::annotate_lipids(l2)
#> # A tibble: 11 × 21
#> Molecule clean…¹ ambig not_m…² istd class…³ chain1 l_1 s_1 chain2 l_2
#> <chr> <chr> <lgl> <lgl> <lgl> <chr> <chr> <int> <int> <chr> <int>
#> 1 PEP 18:1… PEP 18… FALSE FALSE FALSE PEP 18:1 18 1 "18:2" 18
#> 2 TG 42:0/… TG 42:… FALSE FALSE FALSE TG 42:0 42 0 "14:0" 14
#> 3 SM d18:1… SM 18:… FALSE FALSE FALSE SM 18:1 18 1 "20:1" 20
#> 4 Cer 32:0… Cer 32… FALSE FALSE FALSE Cer 32:0 32 0 "" NA
#> 5 HexCer 3… HexCer… FALSE FALSE FALSE HexCer 34:0 34 0 "" NA
#> 6 ST 27:1(… ST 27:… FALSE FALSE FALSE ST 27:1 27 1 "" NA
#> 7 LPCO 14:0 LPCO 1… FALSE FALSE FALSE LPCO 14:0 14 0 "" NA
#> 8 LPEO 14:0 LPEO 1… FALSE FALSE FALSE LPEO 14:0 14 0 "" NA
#> 9 PCO 16:0… PCO 16… FALSE FALSE FALSE PCO 16:0 16 0 "18:1" 18
#> 10 PEO 16:0… PEO 16… FALSE FALSE FALSE PEO 16:0 16 0 "16:1" 16
#> 11 SM 30:0(… SM 30:… FALSE FALSE FALSE SM 30:0 30 0 "" NA
#> # … with 10 more variables: s_2 <int>, chain3 <chr>, l_3 <lgl>, s_3 <lgl>,
#> # chain4 <chr>, l_4 <lgl>, s_4 <lgl>, total_cl <int>, total_cs <int>,
#> # Class <chr>, and abbreviated variable names ¹clean_name, ²not_matched,
#> # ³class_stub
Created on 2023-04-20 with reprex v2.0.2
Hi Ahmed,
Thanks for your reply! I can confirm that the changes you specified produced valid lipid names that can be processed by lipidr
. I've forwarded the information to the scientist I'm working with so he can verify that the values are correct
Marking as closed. Feel free to reopen if this is still an issue
I have names like PE P-18:1/18:2, TG 42:0-FA14:0, SM d18:1/20:1. How should I convert these? Thanks!