Synthoid / ExportSheetData

Add-on for Google Sheets that allows sheets to be exported as JSON or XML.
MIT License
234 stars 46 forks source link

Exporting XML names that contain a colon #126

Closed elkym closed 3 years ago

elkym commented 3 years ago

On line 327 of the gs: function formatXmlName(value, replacement) is the only function that I can find which would seem to be responsible for all colons in tags with underscores (or hyphens, or periods).

There are a number of xml metadata formats that this is troublesome for.

Is this a feature that should be developed? Or am I creating this problem with a user error?

I'm not terribly familiar with Javascript and I'm new to coding, but I think I've puzzled this out reasonably well.

If this is a settings problem that I can solve that way, please do explain.

Thanks.

Synthoid commented 3 years ago

Good catch! When I originally developed that format method I hadn't dealt with namespaced XML so I was going off of the XML specification for valid tag characters. I can add support for colons in XML tags (I'm currently messing with XML so this is good timing!)

elkym commented 3 years ago

A very quick reply! Thanks!

I guess in the meantime, I'll have to do a workaround.

Synthoid commented 3 years ago

I'm planning on publishing v61 tonight, so this should be included as part of that build.

elkym commented 3 years ago

You know, a friend of mine was looking at the code with me (he's a Javascript professional, and I don't read it very well yet), and he suggested some refactors that would condense the script a bit. Would you be interested in seeing what he suggested? (He said I could forward it to you).

elkym commented 3 years ago

I mean, I don't want to pile a whole bunch on your plate or tell you how to write code-- but if what he drafted is useful to you, I'd hate for you to not get a chance to see it (or even just collaborate directly).

Synthoid commented 3 years ago

I'm always up for cleaner code! Feel free to send me suggestions or post it here and I'll take a look!

elkym commented 3 years ago

An attempt at posting the code here is formatting wonky. I'll ask him to point a link or something.

jamesona commented 3 years ago

Oh, I was just adjusting whitespace and switching from var to const/let in one sample section. Every professor I've had preferred Allman braces, and every team I've worked on preferred Egyptian or One-True-Brace braces. I normally just let Prettier be the authority on modern code style.

Synthoid commented 3 years ago

Haha, I tend to lean Allman style as it makes pairing braces easier and adds spacing for readability. And yeah, code can be posted as is by wrapping it with three grave accents (the other char on the ~ key, which I guess mobile keyboards don't have so I can't post a true example currently). It should look something like:

''' YOUR CODE HERE '''

(But replace the apostrophes with accents)

As for the let/const declarations, apps script recently updated its runtime engine to V8 which finally allowed newer JavaScript keywords. Before V8 everything had to be a var which was super annoying. I've moved a few vars to be consts but I still need to take a pass at fully updating code.

Synthoid commented 3 years ago

After messing with this for a bit I think it's going to be far more complicated than I originally expected. Since I'm using XmlService to create XML I can't just ignore : as it is reserved for namespace purposes so attempting to use something like xmlns:xsi as an element or attribute's name will throw an error. Instead, I'll need to parse the cell value into separate values of the namespace string and the actual element name. In the case of the xmlns:xsi example I'll need to separate that into xmlns (namespace) and xsi (name).

This may end up spiraling if I don't track the root element namespace which could cause namespace issues on child elements. I'll need to experiment with how XmlService handles creating and using namespaces since my initial tests don't seem to be yielding desired results. For the time being, I'm going to finish the XML work I had planned for v61 and push this to v62 so I can more thoroughly test it. This means you'll have to do a workaround for namespaces for now. Sorry for the inconvenience!

elkym commented 3 years ago

Thanks for the update. I wondered if it would be more complex-- I wasn't sure what all of the pieces you were relying on were. Good luck. Maybe I'll be decent enough at coding to lend you a hand sometime. (And maybe Grad School will crush me into small pieces first).

elkym commented 3 years ago

Question:

On line 344:

xmlName = xmlName.replace(/[^a-zA-Z0-9\-\_]/gm, replacement); //Replace non-alphanumeric, dash, underscore, or period chars with an underscore

Does this limitation accurately represent the only characters that can appear in an XML tag? Alternatively, does it accurately represent the only characters that can be processed via XMLService?

Synthoid commented 3 years ago

I largely pulled the valid chars from the specifications on w3schools. There's a section on that page talking about what chars are valid for XML element names. I should double check that the regex there also grabs spaces and dots. Note that these limitations do not apply to the value of a tag, just the tag name itself. A tag (or attribute) with the name Na|me would be invalid, but the tag's value can be Na|me without issue.

I was originally creating XML by manual string concatenation before I moved over to using XmlService. Even though adding namespace support would have been easier with manual XML generation, XmlService ensures that ESD produces valid XML every time so I'm going to have to figure out how best to handle namespaces through that. I think I had namespace generation working last night (in the sense that I was creating Namespace values), but XmlService didn't seem to like how I was doing it and threw errors on attributes created with a namespace.

Synthoid commented 3 years ago

Just pushed v61 and I'm thinking about how to handle this best. I think a good approach would be allowing users to set up a list of namespaces in the sidebar then allow key cells to use namespaces via prefixes separated by colons. This would probably look something like a reorderable list with input fields for the namespace prefix and URI:

Label Field
Prefix xhtml
URI http://www.w3.org/1999/xhtml

That would add xmlns:xhtml="http://www.w3.org/1999/xhtml" to the root element's tag. Once the user has declared the xhtml namespace, they could use that prefix in key cells, for example: xhtml:test

Does that seem like a good approach or is there something else you had in mind?

Synthoid commented 3 years ago

It's a little clunky right now, but I do have the UI for namespaces roughly working. I've added a new XML section to contain namespace settings since it would have seriously bloated the Advanced XML section. I still need to get the namespace parsing on the sheets side set up, but it's a start!

image

Synthoid commented 3 years ago

XML namespace UI is looking much better now. Just have some minor UX to deal with there and I can move onto getting namespaces working in exported XML.

image

Synthoid commented 3 years ago

Things are looking good on namespace support. I'm hoping to get it finished up and ready to go in v62 this weekend. Had to jump through a few hoops to get multiple namespaces declared in a single element since XmlService apparently doesn't support that (even though people have been asking for it since 2015 according to one support thread). I'm debating adding an option to set namespace declaration behavior, either in the root element like below or per element. Not sure if there's a preference there. In any case, I have this exporting from a test sheet:

<data xmlns="urn:schemas-microsoft-com:office:spreadsheet" xmlns:x="urn:schemas-microsoft-com:office:excel" xmlns:ss="urn:schemas-microsoft-com:office:spreadsheet">
  <x:test1>
    <XLSX>1</XLSX>
    <ss:Type>Test</ss:Type>
  </x:test1>
  <test2>
    <XLSX>0</XLSX>
    <ss:Type>1.02</ss:Type>
  </test2>
  <test3>
    <XLSX>true</XLSX>
  </test3>
  <test4>
    <XLSX>false</XLSX>
  </test4>
</data>
Synthoid commented 3 years ago

This is now implemented and will roll out with the v62 release.