Closed meowcat closed 3 years ago
Thank you for your report. A fix will be included in the next release. The problem here is, that semicolon is a legal character in chemical names and we use semicolon as separation character for the title. I made the separation of the title fields now a little bit more precise by using "; ". To make it bullet proof I would have to reject all chemical names with a space behind the semicolon. I guess that would be ok in principle, I just dont know how to code that atm... I leave this issue open for a while until I finally fixed this.
Thanks! Now looking at the code, I see where the problem comes from. Would it be much overhead to use a regex instead? This should work because the capture is greedy (I don't know off the top of my head the Java syntax):
library(tidyverse)
regex <- "(.*);(.*);(.*)"
name <- "ST 27:1;O;Hex;FA 14:0; LC-ESI-QTOF; MS2"
str_match(name, regex)
This issue is solved. Possible problematic records for web app will not pass the Validator. Visualization is fixed with my commit.
Hi,
the MassBank record specification does not say that semicolons are forbidden in CH$NAME. When this is the case, it ends up in the record title, also passes the corresponding validation, and ends up incorrectly grouping compounds in the record index (and potentially elsewhere?) I noted this when making a personal MassBank from the LipidBlast Tsugawa version, which uses compound names such as
ST 27:1;O;Hex;FA 14:0