RGLab / CytoML

A GatingML Interface for Cross Platform Cytometry Data Sharing
GNU Affero General Public License v3.0
30 stars 14 forks source link

Retain gate names as IDs when converting FlowJo workspace to GatingML #69

Closed whitews closed 5 years ago

whitews commented 5 years ago

I'm using CytoML to convert FlowJo workspaces to GatingML 2.0. However, somewhere in the process, the gate names that are present in the FlowJo workspace get converted to what appears like automatically incremented gate IDs. This makes it difficult to track the original meaning of the gates in the hierarchy.

For example, here is a gate in the original FJ workspace:

<Polygon name="FSC-A, SSC-A subset" bounds="139,51,256,236" editable="0" selected="1" gateBitID="1" lineColor="Transparent" lineArrow="None" xAxisName="FSC-A" yAxisName="SSC-A" annotationOffset="42,-37">
    <Polygon>
    <Vertex x="2.416640043258667e5" y="76800" />
    ...
    </Polygon>
    </Polygon>
</PolygonGate>

And the resulting GatingML gate:

<gating:PolygonGate gating:id="gate_1_1">
    <gating:dimension gating:compensation-ref="Spill_defaultCompensation">
      <data-type:fcs-dimension data-type:name="FSC-A"/>
    </gating:dimension>
    <gating:dimension gating:compensation-ref="Spill_defaultCompensation">
      <data-type:fcs-dimension data-type:name="SSC-A"/>
    </gating:dimension>
    <gating:vertex>
    ...
</gating:PolygonGate>

You can see that the gate name "FSC-A, SSC-A subset" gets converted to a gate ID of "gate_1_1".

Here is the code I am using for the conversion:

library(flowWorkspace)
library(CytoML)
library(flowUtils)

fcs_dir = "/path/to/flow_data"
setwd(fcs_dir)

path <- system.file("", package="flowWorkspaceData");
wsfile <- list.files(fcs_dir, pattern="flowjo_workspace.xml", full=TRUE)
ws <- open_flowjo_xml(wsfile)

gs_chx <- parseWorkspace(ws, name='CHX')

# create empty flowEnv
flowEnv <- new.env(parent = emptyenv())
res <- CytoML:::export_comp_trans(
    gs_chx, 
    flowEnv, 
    cytobank.default.scale=FALSE, 
    type="cytobank"
)
# export gates
CytoML:::export_gates_cytobank(
    gs_chx, 
    flowEnv, 
    res[["trans.Gm2objs"]], 
    res[["trans"]], 
    res[["compId"]], 
    showHidden=FALSE
)

write.gatingML(flowEnv, 'gml_output.xml')

Is there any way to retain the original names in the GatingML file? I know the gate names are repeated in the FlowJo workspace because of multiple samples & sample groups, but it seems the automatically incremented gate IDs could be concatenated with the original names to ensure their uniqueness and retain their context in the gating hierarchy. Or, even better would be to only perform the concatenation if non-unique names are found in the resulting GatingML output.

mikejiang commented 5 years ago

CytoML officially only supports two output xml formats: flowjo, cytobank, neither is standard gatingML (strictly speaking). The APIs you are using are private and only designed and dedicated for one of these two specific formats.

Before we devote our resource into the standard generic GatingML support, can you please explain why do you need it and what is the context of your use case?

whitews commented 5 years ago

Hi Mike,

Yes, I realize I am using private functions, which is not ideal but the only way I can get valid GatingML from a FlowJo workspace. I'm also aware that neither a FlowJo workspace nor the cytobank XML are standard GatingML documents.

I develop and maintain Python libraries for working for flow cytometry data, and our group plans on creating analysis pipelines starting with a base set of gates that are manually created. Our main library supports GatingML 2.0 since FlowJo's XML is not open and there is no XSD available (that I'm aware of) to validate those documents. Plus, as you are probably well-aware, reverse engineering their format would require a significant amount of effort...the RGLab suite of libraries seems to be the only tool available that does read their workspace files well.

I am considering support for the Cytobank XML format since it is very close to GatingML, but it seems there is no XSD for that format either. Also, the Cytobank XML output from openCyto (or is it cytoML?) produces invalid XML. I am in the process of making a converter for these files to make them valid XML, and have also created my own internal XSD for that format. It would be nice if this XSD could be hosted somewhere, I'd be happy to provide it. I would also be glad to have a go at creating a pull request for the appropriate RGLab library to make the Cytobank export valid XML...the changes necessary are rather straight-forward.

mikejiang commented 5 years ago

Since we won't support standard GatingML2 at the moment, your plan for working with the existing CytoML output sounds reasonable. Feel free to submit the PR if it is just minor simple additions. But we won't be able to accommodate the changes if they get too intrusive.

whitews commented 5 years ago

Okay, sounds like a plan. Do you have guidelines for creating PRs (inclusion of tests, code style, etc.)?

mikejiang commented 5 years ago

We basically follow ropensci guide lines, PR at least should pass all the existing test cases and adding new tests as needed (especially for new features)

whitews commented 5 years ago

Closing this since GatingML is not officially supported.