We've seen a case now where if the model gene product IDs contain "special" characters such as - or ( or similar, these get mangled by cobratoolbox by encoding to their ASCII values. In turn, we've seen a report in curation where there is the following difference in gene product IDs in the report and in the model:
julia> setdiff(genes(model), genes_report) # gene IDs that are in fbc_curation_matlab report but not in the model
13-element Vector{Any}:
"G_YBR058C__45__A"
"G_YCL005W__45__A"
"G_YCR024C__45__A"
"G_YDR322C__45__A"
"G_YEL017C__45__A"
"G_YER060W__45__A"
"G_YHR001W__45__A"
"G_YHR039C__45__A"
"G_YLL018C__45__A"
"G_YML081C__45__A"
"G_YOL077W__45__A"
"G_YPL096C__45__A"
"G_YPR170W__45__B"
julia> setdiff(genes_report, genes(model)) # gene IDs in the model that are not in the report
13-element Vector{Any}:
"G_YBR058C-A"
"G_YCR024C-A"
"G_YDR322C-A"
"G_YEL017C-A"
"G_YER060W-A"
"G_YHR001W-A"
"G_YHR039C-A"
"G_YML081C-A"
"G_YCL005W-A"
"G_YOL077W-A"
"G_YLL018C-A"
"G_YPL096C-A"
"G_YPR170W-B"
Technically this is an easy fix (the curators "just" walk the output CSVs manually and replace the mangled representations back), but it would be great to have some automated tool for this. Or at least have a warning printed, so that the users know that either
their model should have the - characters removed to work perfectly
they need to fix the reports manually
Thanks!
PS I think it would be greater to fix this directly in cobratoolbox, but since they depend on this mangling because of their eval use I somehow don't have much illusion about a good solution existing there.
We've seen a case now where if the model gene product IDs contain "special" characters such as
-
or(
or similar, these get mangled by cobratoolbox by encoding to their ASCII values. In turn, we've seen a report in curation where there is the following difference in gene product IDs in the report and in the model:Technically this is an easy fix (the curators "just" walk the output CSVs manually and replace the mangled representations back), but it would be great to have some automated tool for this. Or at least have a warning printed, so that the users know that either
-
characters removed to work perfectlyThanks!
PS I think it would be greater to fix this directly in cobratoolbox, but since they depend on this mangling because of their
eval
use I somehow don't have much illusion about a good solution existing there.PPS. the model is yeast-gem, in this particular instance here: https://www.ebi.ac.uk/biomodels/MODEL2204280003#Files
cc @feiranl @rsmsheriff @ntung