RGLab / flowWorkspace

flowWorkspace
GNU Affero General Public License v3.0
44 stars 21 forks source link

broken node name when plot gating tree on mac #114

Closed mikejiang closed 10 years ago

mikejiang commented 10 years ago

The node name (when it has minus sign) in the graph is corrupted when graphviz library tries to convert dot file to gxl file.

  DotFile<-tempfile(fileext=".dot")
  .Call("R_plotGh",x@pointer,getSample(x),DotFile,FALSE)
  GXLFile<-tempfile(fileext=".gxl")
  system(paste("dot2gxl",DotFile, ">>",GXLFile))

It is mac-specific issue.

mikejiang commented 10 years ago

When dot2gxl command converts the same graph file from dot to gxl format, it keeps the node name as it is on linux ( e.g. v-) , but encoded it to v&#45; on mac. Then somehow graph::fromGXL doesn't recognize the encoded node name. So it is the gxl parser from graph that corrupted the name. But the platform-specific behaviour of graphviz converter also confounded this issue.

gfinak commented 10 years ago

Anything we can do to get this working properly?

On Wed, Nov 20, 2013 at 12:02 PM, Mike Jiang notifications@github.comwrote:

When dot2gxl command converts the same graph file from dot to gxl format, it keeps the node name as it is on linux ( e.g. v-) , but encoded it to v- on mac. Then somehow graph::fromGXL doesn't recognize the encoded node name. So it is the gxl parser from graph that corrupted the name. But the platform-specific behaviour of graphviz converter also confound this issue.

— Reply to this email directly or view it on GitHubhttps://github.com/RGLab/flowWorkspace/issues/114#issuecomment-28925016 .

Greg Finak, PhD Staff Scientist Vaccine and Infectious Disease Division Fred Hutchinson Cancer Research Center Seattle, WA 98109

(206)667-3116 gfinak@fhcrc.org

mikejiang commented 10 years ago

I am looking into fromGXL call to see if there is an easy fix.

mikejiang commented 10 years ago

graph::fromGXL uses XML:xmlEventParse to parse each tag from gxl (as shown below, essentially an xml file)

<gxl>
        <graph id="G" edgeids="true" edgemode="directed">
                <node id="N_0">
                        <attr name="label">
                                <string>root</string>
                        </attr>
                </node>
                <node id="N_1">
                        <attr name="label">
                                <string>&#45;v</string>
                        </attr>
                </node>
        </graph>
</gxl>

Here is what happened based on the diagnostic output of graph:::graph_handler: Whenever xmlEventParse encounters the node name (stored in /gxl/graph/node/attr/string tag) that contains the numeric character reference (i.e. unicode &#nnnn;), it

when graph:::graph_handler tries to collect the node name from the parsed tag content,

The quick fix will be hacking into graph:::graph_handler to concatenate these characters when they are coming from the same attribute of the same node.

kevinushey commented 10 years ago

Platform specific issues are always the best...

Thanks for investigating this!

On Wed, Nov 20, 2013 at 4:05 PM, Mike Jiang notifications@github.comwrote:

graph::fromGXL uses XML:xmlEventParse to parse each tag from gxl (as shown below, essentially an xml file)

root -v

Here is what happened based on the diagnostic output of graph:::graph_handler: Whenever xmlEventParse encounters the node name (stored in /gxl/graph/node/attr/string tag) that contains the numeric character reference (i.e. unicode &#nnnn;), it

  • decode it into ASCII
  • and splits the content into string array instead of concatenated them into single string (source of evil)

when graph:::graph_handler tries to collect the node name from the parsed tag content,

  • it assumes there is only one string per tag
  • thus mistakenly reads split character vector multiple times
  • with the last character overwriting all the previous ones.
  • then we ended up seeing partial (-) instead of complete node name (v- )

The quick fix will be hacking into graph:::graph_handler to concatenate these characters when they are coming from the same attribute of the same node.

— Reply to this email directly or view it on GitHubhttps://github.com/RGLab/flowWorkspace/issues/114#issuecomment-28945873 .

mikejiang commented 10 years ago

It is really the problem of XML parser. I think dot2gxl 's behaviour (encoding special character) on mac is more appropriate ( not sure why it doesn't do it for linux).