Closed gdmcdonald closed 3 years ago
It looks fine to me. First make sure that lip_name
is your first column. exp_df[[1]]
should give you the character vector of lipid names.
Lipid names compliance is checked with the internal method lipidr:::.have_lipids_molecules
, which requires at least 50% of lipid names to be parsed correctly. You can check if lipidr:::.have_lipids_molecules(exp_df[[1]])
returns FALSE
.
The last resort is to find which lipid names were not parsed correctly. This can be done with annot <- lipidr::annotate_lipids(exp_df[[1]])
, which will give you a warning with the names that were not parsed. It will also return a data.frame with the lipid names and their parsed components. annot %>% filter(not_matched)
will give you a list of non-parsed lipids.
If this gives you weird results, let me know, and I can see why lipidr
can't parse your dataset.
Cheers.
Initially, it looks like the problem is the same as #10 as the 5 offending lipid names (out of 3916 names = 0.2%) which do not parse are all coenzyme Q.
> lipidr:::.have_lipids_molecules(exp_df[[1]])
[1] FALSE
> annot <- lipidr:::annotate_lipids(exp_df[[1]])
Warning message:
In lipidr:::annotate_lipids(exp_df[[1]]) :
Some lipid names couldn't be parsed because they don't follow the pattern 'CLS xx:x/yy:y'
Co Q10, Co Q7, Co Q8, Co Q9
> annot %>% filter(not_matched)
# A tibble: 4 x 21
Molecule clean_name ambig not_matched istd class_stub chain1 l_1 s_1 chain2 l_2 s_2 chain3 l_3 s_3 chain4
<chr> <fct> <lgl> <lgl> <lgl> <chr> <chr> <int> <int> <chr> <int> <int> <chr> <int> <int> <chr>
1 Co Q10 Co Q10 FALSE TRUE FALSE NA NA NA NA NA NA NA NA NA NA NA
2 Co Q7 Co Q7 FALSE TRUE FALSE NA NA NA NA NA NA NA NA NA NA NA
3 Co Q8 Co Q8 FALSE TRUE FALSE NA NA NA NA NA NA NA NA NA NA NA
4 Co Q9 Co Q9 FALSE TRUE FALSE NA NA NA NA NA NA NA NA NA NA NA
# … with 5 more variables: l_4 <int>, s_4 <int>, total_cl <int>, total_cs <int>, Class <chr>
Ok, so I remove those rows and see if it works? But it doesn't work even then:
> some_df <- exp_df %>% filter(!grepl("Co Q",lip_name))
> some_exp <- as_lipidomics_experiment(some_df, logged = FALSE, normalized = TRUE)
Error in as_lipidomics_experiment(some_df, logged = FALSE, normalized = TRUE) :
Data frame does not contain valid lipid names. Lipids features should be in rownames or the first column.
> lipidr:::.have_lipids_molecules(some_df[[1]])
[1] FALSE
annot2 <- lipidr:::annotate_lipids(some_df[[1]])
> sample_n(annot2,10)
# A tibble: 10 x 21
Molecule clean_name ambig not_matched istd class_stub chain1 l_1 s_1 chain2 l_2 s_2 chain3 l_3 s_3 chain4
<chr> <fct> <lgl> <lgl> <lgl> <chr> <chr> <int> <int> <chr> <int> <int> <chr> <int> <int> <chr>
1 MePC 38… MePC 38:6 FALSE FALSE FALSE MePC 38:6 38 6 "" NA NA "" NA NA ""
2 phSM 38… phSM 38:2 FALSE FALSE FALSE phSM 38:2 38 2 "" NA NA "" NA NA ""
3 TG 11:0… TG 11:0/2… FALSE FALSE FALSE TG 11:0 11 0 "24:2" 24 2 "24:2" 24 2 ""
4 TG 20:0… TG 20:0/1… FALSE FALSE FALSE TG 20:0 20 0 "10:3" 10 3 "10:3" 10 3 ""
5 MePC 29… MePC 29:0 FALSE FALSE FALSE MePC 29:0 29 0 "" NA NA "" NA NA ""
6 TG 20:5… TG 20:5/1… FALSE FALSE FALSE TG 20:5 20 5 "14:3" 14 3 "18:2" 18 2 ""
7 dMePE 1… dMePE 16:… FALSE FALSE FALSE dMePE 16:0 16 0 "18:2" 18 2 "" NA NA ""
8 SM 38:0 SM 38:0 FALSE FALSE FALSE SM 38:0 38 0 "" NA NA "" NA NA ""
9 TG 18:1… TG 18:1/1… FALSE FALSE FALSE TG 18:1 18 1 "18:1" 18 1 "22:0" 22 0 ""
10 TG 16:0… TG 16:0/1… FALSE FALSE FALSE TG 16:0 16 0 "18:1" 18 1 "20:4" 20 4 ""
# … with 5 more variables: l_4 <int>, s_4 <int>, total_cl <int>, total_cs <int>, Class <chr>
So now all the lipid names parse just fine, the names are in the first column of the data frame, and it still doesn't recognize them?
Not sure what's wrong here?
Thanks for the info and sorry you're still having issues. This definitely looks like a bug in lipidr
, however I can't reproduce it on my end. It's also different from #10, since lipidr
is tolerant to 50% non-parsed molecules, and you definitely don't need to remove these lipids for it to work.
Few options here:
data.frame
with as.data.frame
. Tibbles work fine on my end, but just in case they are the cause of the problem.lipidr:::.have_lipids_molecules
, which is surprising given that it's a very simple function (https://github.com/ahmohamed/lipidr/blob/master/R/check_files.R#L83). You can try:mols <- unlist(df[[1]])
matched <- !annotate_lipids(mols, no_match = "ignore")$not_matched
print(sum(matched))
print(length(matched))
Simply, sum(matched)
should at least be half of length(matched)
.
Alternatively, you can email me the molecule list to my email and I'll look into it for you.
Thanks.
Thanks for your help. Even though I installed lipidr a few days ago, turns out BioC won't install the latest version of itself and therefore of lipidr unless I'm running R > 4.0. So now I have upgraded everything and it finally works. Thanks again.
While trying to create a lipidomics experiment from a a csv I have loaded into a tibble,
I keep getting this error:
My first column is a character vector with names that look like this: