Closed BenjaSanchez closed 4 years ago
@Midnighter do you foresee any potential issues with this decision? were you aware of this difference between cobrapy and cobratoolbox on how the gene names are stored in the xml file?
I was not aware of this difference. Taking a look at the FBC specification, the label
field is a required attribute so practically it seems like the better choice. Since it is required, cobrapy must also set it when writing to SBML, though, doesn't it?
Can you please raise an issue on cobrapy about it?
@Midnighter it does indeed set it, just equal to id
from what I see in other models. I believe what cobrapy does currently (storing the gene name in the name
field) is the better suited choice, and therefore this issue should maybe be raised on cobratoolbox instead. But let me know if you have a different view.
IMHO, the proposed solution here should just be a workaround, as geneNames
and proteins
are not necessarily the same thing (alternative splicing can give different proteins). To solve this, either COBRA Toolbox or cobrapy should modify their code. FYI, RAVEN also stores it in label
, to follow compatibility with COBRA toolbox.
@edkerk I agree, this is a temporal workaround to guarantee that cobrapy users can get all model fields they need. The permanent solution is to have cobratoolbox and cobrapy agree with one format, and as soon as that happens we can prescind from this change and entirely remove the proteins
field (which holds no info atm). I can open an issue in our repo to remember to do that. Let me know of you think another approach would be better.
From my perspective it's a good idea to coordinate on this. I don't really care if it's an issue on the toolbox or on cobrapy.
Following @Midnighter's earlier comment, label
is required, which means that it might be best if cobrapy adjusts to adhere to this?
@edkerk I have now opened https://github.com/opencobra/cobrapy/issues/950 for discussion and #217 as a to-do in this repo once a solution is found. If anything else is needed let me know in a review :)
Main improvements in this PR:
This partly addresses #102: After the merge of https://github.com/opencobra/cobrapy/pull/685, every component in yeast-GEM is properly parsed when loaded into cobrapy, with the exception of gene names (notebook with the 1st analysis here): this is because gene names are stored by cobratoolbox in the
fbc:label
field in the xml file, whereas cobrapy looks for these in thefbc:name
field. So instead, I've adapted the Matlab saving wrapper so that it copiesmodel.geneNames
ontomodel.proteins
, as that is translated to the compliantfbc:name
field. By doing this we won't loose any information, as inmodel.proteins
we had placeholders until now. The only downside is that the xml will now have duplicated info:But as we now over-write the protein info with the wrapper, and cobrapy only reads one of them, we should not have any issues. Also, from now on no need to redefine each time the
model.proteins
field @feiranl.I hereby confirm that I have:
devel
as a target branch (top left drop-down menu)