Closed chloemhall closed 2 years ago
Hi Chloe,
Very sorry for the delayed response. Non-parsed molecules are fixed by renaming them. You can do that from R using regex. In your case, the problem is your class names contain non-alphanumeric characters, you can to the following:
old_names = non_parsed_molecules(data)
new_names = sub("[NS]","NS", old_names, fixed = TRUE)
data = update_molecule_names(data, old_names, new_names)
Of course, if you have other non-parsed patterns you'll need to address them as well. Refer to Regex manual and let me know if you need further help.
Cheers, Ahmed.
Hi Ahmed,
Many thanks for your response!! After digging into this more I don't think the issue can be related to non-alphanumeric characters as plenty of lipid names with : or - are parsed fine. In addition, if I just change some of the non-parsed examples above to "a" "b" "c" etc, they still remain non-parsed… do you have any idea of what lipidr is looking for in the names please? Do you for example have a standard list of lipids it looks for?
Thanks and sorry to bother you more, best wishes, Chloe
Hi Chloe, This is probably because you're using a single letter as the class name. Since no class names are single-lettered in LipidMaps, lipidr doesn't support them. These are the main patterns the lipidr uses to parse the names:
lipidnames_pattern$class <- "([[:alnum:]]{2,15})"
lipidnames_pattern$chain <- "(\\d{1,2}:\\d{1,2})"
You can see, classes should be 2-15 alphanumeric characters. Chains should be numeric formatted as xx:yy (1-2 digits).
If it still doesn't work, it would be good to copy here the list of non-parsed molecules so I can help with.
Cheers, Ahmed.
Dear Ahmed,
Cannot thank you enough for your kind help with this problem. I believe we have now solved it using the labelling formats you suggested, so thank you again.
Best, Chloe
Glad it worked out in the end.
HI, I have many non-parsed molecules. Is there a way to sort through them, or changes the names into a format lipidr can read? e.g. non-parsed "Cer[NS] d36:2" "Cer[NS] d38:0" "Cer[NS] d38:2"
but "Cer[NS] d32:1" is read fine…
Thanks, Chloe