GoyaPtyLtd / BaseElements-Plugin

FileMaker Pro plugin used for BaseElements to provide file, dialog and XSLT functions.
http://www.goya.com.au/baseelements/plugin
154 stars 51 forks source link

Support for utf-16 in Xpath functions #190

Closed petrowsky closed 2 years ago

petrowsky commented 4 years ago

Hey @minstral and @nickorr I've looked at the .cpp implementation file for the xpath functions but can't tell if you're supporting the utf-16 that libxml2 has support for.

I do see references to UTF-8 and currently, the plugin doesn't seem to handle utf-16 for xml/xpath.

I'm asking because FileMaker started using utf-16 on their clipboard snippets for custom menus and custom menu items. Here's the header of a menu item.

<?xml version="1.0" encoding="utf-16"?>
<FMObjectTransfer version="1" Source="18.0.3" membercount="">
    <MenuItemList membercount="1">

as opposed to the utf-8 for the older objects...

<?xml version="1.0" encoding="UTF-8"?>
<fmxmlsnippet type="FMObjectList">
    <Step enable="True" id="93" name="Beep"></Step>
</fmxmlsnippet>

I can quickly solve the issue by simply changing the encoding on the fly when I can determine if the xml does not include multi-byte characters. But, would it be that much work to support reading in utf-16? According the the libxml2 page it does support it.

nickorr commented 4 years ago

Matt,

Is it causing issues in a particular use case?

I know the set encoding function affects text when getting or retrieving, but I don't know if this is hooked into the XML/xpath like you mention.

But I've done a lot of clipboard stuff, and never had issues so far...

Cheers, Nick

petrowsky commented 4 years ago

Ah, then this is lack of understanding on my part. Didn't know I needed to instruct the plugin to switch encodings. Haven't worked with utf-16 stuff much. Was under the impression the encoding would be inferred when the call was made.

Will play with the encoding switch.

petrowsky commented 4 years ago

Ok, reopening the issue. I don't think the BE_XPath function is respecting the encoding set by BE_SetTextEncoding. Unless I'm missing something.

Here's an example.

Bad Encoding Support XPath

nickorr commented 4 years ago

Remember that the encoding listed inside the XML is what it started with, and you can change the storage of the text without changing the value inside that attribute.

You may need to give me an example file... Just because FileMaker stores natively in utf8, and so it's entirely possible that you have utf16 from the clipboard, put into FM somewhere, converted to utf8 and then read by the xpath function

This would explain why it's not working, as when you tell it to try using utf16 it fails as the content has already been converted to utf8...

Better would be a function to tell you what the encoding of some text is, but that's not always possible, maybe for utf8/16 but definitely not others.

Cheers, Nick

petrowsky commented 4 years ago

I'll send you the sample file. But it looks like the Xpath functions are not processing the xml in a utf-16 format. At least I can't get it to work. I've tried both putting it into a variable and also reading directly from the clipboard with the encoding set to utf-16.

For reference, here are the scripts I'm using. I'll also send you the file.

#  Original clipboard formats were in UTF-8. The newer Custom Menu and Custom Menu Item are UTF-16
# 
Set Variable [ $format ; Value: GetValue ( BE_ClipboardFormats ; 1 ) ] 
Set Variable [ $encoding ; Value: If ( $format = "public.utf16-plain-text" ;  Choose ( BE_SetTextEncoding ( "UTF-16" ) ; "UTF-16" );  Choose ( BE_SetTextEncoding ( "UTF-8" ) ; "UTF-8" ) ) ] 
Set Variable [ $xml ; Value: BE_ClipboardGetText ( $format ) // Does putting it into a variable convert it to UTF-8? ] 
# 
#  We captured UTF-16 supposedly. Yet, Xpath will not extract the values.
Set Variable [ $version ; Value: BE_XPath ( $xml ; "/FMObjectTransfer/@version" ) ] 
Set Variable [ $source ; Value: BE_XPath ( $xml ; "/FMObjectTransfer/@Source" ) ] 
// #  Trying directly from clipboard.
// Set Variable [ $version ; Value: BE_XPath ( BE_ClipboardGetText ( $format ) ; "/FMObjectTransfer/@version" ) ] 
// Set Variable [ $source ; Value: BE_XPath ( BE_ClipboardGetText ( $format ); "/FMObjectTransfer/@Source" ) ] 
Show Custom Dialog [ "Result: Using UTF-16" ; If ( IsEmpty ( $version ) ; "We got nothing" ; "We got version " & $version & " Source of " & $sour… ] 
# 
#  Now let's try UTF-8 encoding type within the XML directly. And we change the encoding type within the XML.
Set Variable [ $encoding ; Value: If ( BE_SetTextEncoding ( "" ) = 0 ; "UTF-8" ) // Reset to UTF-8 ] 
Set Variable [ $xml ; Value: Substitute ( $xml ; "encoding=\"utf-16\"" ; "encoding=\"utf-8\"" ) ] 
Set Variable [ $version ; Value: BE_XPath ( $xml ; "/FMObjectTransfer/@version" ) ] 
Set Variable [ $source ; Value: BE_XPath ( $xml ; "/FMObjectTransfer/@Source" ) ] 
Show Custom Dialog [ "Result: Using UTF-8" ; If ( IsEmpty ( $version ) ; "We got nothing" ; "We got version " & $version & " Source of " & $sour… ] 
# 
Exit Script [ Text Result:    ] 
nickorr commented 4 years ago

Matt,

Some changes done, can you have a play with this one :

https://goya.com.au/files/beplugin/Test/BaseElements.fmplugin.zip https://goya.com.au/files/beplugin/Test/BaseElements.fmx.zip https://goya.com.au/files/beplugin/Test/BaseElements.fmx64.zip

Are the comments up here public : https://goyapl.atlassian.net/browse/BEPLUGIN-36 ??

Cheers, Nick

petrowsky commented 4 years ago

Yep, just ran through the plugin on the Mac side. Will test on Windows.

That did address the issue and no, the comments, as far as I can tell are not public.

One thing, however, is I don't think the encoding, with regards to the use of Xpath, is strictly enforced. While the BE_Xpath function will now return the proper values when encoding is set to UTF-16 (on the plugin side), it will also return the values when you've captured UTF-16 from the clipboard, yet you leave encoding set to UTF-8.

Actually, in my opinion, when it comes to parsing the XML, I would just infer the encoding from the specified attribute of the xml tag. That would seem the most simple and requires the least amount of explanation and/or documentation.

I don't know if being able to use xpath with both 8 and 16 is a side affect of the Xpath implementation from a library standpoint or from your code. But I am getting the specified values now.

petrowsky commented 4 years ago

Also tried logging into JIRA and got the following. name@email.com doesn't have access to Jira on goyapl.atlassian.net.

nickorr commented 4 years ago

Matt,

Glad to hear that's working. The issue with inferring the content type from the XML tag is that you can have utf8 content with a utf16 tag, especially when working in FileMaker where it will often store in utf8 internally.

So reading in some text to find the tag requires knowledge of what the content encoding is, ( even though utf makes it fairly easy and consistent, in theory it could be any type ), and so there's a chicken and egg problem that you can't read it without first knowing what it is.

I'll see if I can make the jira thing public read access, that would be handy I think.

Cheers, Nick

nickorr commented 4 years ago

And as to the plugin working even when set to utf8, the plugin will be trying to read in as utf8 and will work fine because there's not actually any content that is encoded in a way that requires utf16 and would display wrongly in utf8.

I'm yet to see a DDR that requires utf16, and I've asked about it before as it doubles storage requirements.

But in theory it might be possible to get the XPath to break by using one of the other obscure encoding formats that doesn't give you accurate text.

Cheers, Nick