SynBioDex / libSBOLj

Java Library for Synthetic Biology Open Language (SBOL)
Apache License 2.0
38 stars 24 forks source link

Problem with GenBank annotation labels with symbols #494

Open cjmyers opened 7 years ago

cjmyers commented 7 years ago

If a GenBank annotation label includes a symbol, such as a %, then it creates an invalid XML tag upon conversion. Will need to consider a means to convert the GenBank annotation label into a valid XML tag, preferably in a way that is reversible, if possible.

cjmyers commented 7 years ago

From a little googling, it looks like the recommendation is to escape the character just like you would in HTML, e.g., &x25 for a percent sign. That should be able to be handled automatically for this (and all other special characters) with the appropriate call to an encoding library --- maybe even in your XML library.

Thanks, -Jake

cjmyers commented 6 years ago

Seems it may not be that simple:

https://stackoverflow.com/questions/4301236/escaping-special-character-when-generating-an-xml-in-java

The characters that are legal in a node name are quite limited, and I cannot find any function that will reliably convert this. I think for now, I will just replace symbols with underscores, but in future if symbols are found, then perhaps need a nested annotation that stores the tag and value both as strings.