RGLab / CytoML

A GatingML Interface for Cross Platform Cytometry Data Sharing
GNU Affero General Public License v3.0
30 stars 14 forks source link

Error in xmlElementsByTagName(gateNode, "region")[[1]] : subscript out of bounds #101

Closed Biomiha closed 4 years ago

Biomiha commented 4 years ago

Hi all,

I was wondering if you could help with a problem parsing a diva workspace please. My code is:

ws_diva <- open_diva_xml("~/Projects/Diva_xml.xml")
gs <- diva_to_gatingset(obj = ws_diva, worksheet = "normal", name = "mouse_7", path = "~/Projects/", subset = "mouse_7_Tube_001.fcs", swap_cols = FALSE, verbose = TRUE)

From the verbose output I get:

Parsing 1 samples
loading data: /home/user_one/Projects/mouse_7_Tube_001.fcs
Compensating
computing data range
transforming ...
parsing gates ...
All Events
P1
P2
P3
B cells
IgG+
Ag-
Ag+
Error in xmlElementsByTagName(gateNode, "region")[[1]] : 
  subscript out of bounds

I have no problems parsing in the workspace from a previous experiment where all the gates were labelled P1 - P8, but I don't think that should be the issue. Or is it?

sessionInfo()

Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04.4 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1

locale:
 [1] LC_CTYPE=C.UTF-8       LC_NUMERIC=C           LC_TIME=C.UTF-8        LC_COLLATE=C.UTF-8    
 [5] LC_MONETARY=C.UTF-8    LC_MESSAGES=C.UTF-8    LC_PAPER=C.UTF-8       LC_NAME=C             
 [9] LC_ADDRESS=C           LC_TELEPHONE=C         LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C   

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] CytoExploreR_1.0.7  openCyto_2.1.0      flowCore_2.1.0      CytoML_2.1.4       
[5] flowWorkspace_4.1.3 dplyr_1.0.0  
mikejiang commented 4 years ago

Based on the error message, it appears that Ag+ gate doesn't have region defined in xml. But we will have to confirm it by looking at your example xml

mikejiang commented 4 years ago

Ag+ appears to be boolean gate, and I've added support for that in the latest commit, here is one example I created

>   ws <- open_diva_xml(file.path(path, "tcell/tcell.xml"))
>   gs <- diva_to_gatingset(ws, name = 1)
>   parsedStats <- gh_pop_compare_stats(gs[[1]])
> parsedStats[4:10, ]
   openCyto.freq   xml.freq openCyto.count xml.count      node
1:    0.55385425 0.54840651          66267     66525       cd3
2:    0.17758462 0.17688087          11768     11767       cd8
3:    0.71851751 0.71864713          47614     47808       cd4
4:    0.06745439 0.06857572           4470      4562        P1
5:    0.05868683 0.05970688           3889      3972  cd4 & P1
6:    0.89610213 0.89552800          59382     59575 cd8 | cd4
7:    0.28148249 0.28135287          18653     18717   not cd4

As shown, I've created AND , OR and NOT (i.e. invert) gate and they all parse ok now. Here are how these gate should look like in the diva xml

<gate fullname="All Events\L\P2\cd3\cd4 & P1" type="AND_Classifier">
<name>cd4 & P1</name>
...
<input>All Events\L\P2\cd3\cd4</input>
<input>All Events\L\P2\cd3\P1</input>
</gate>
<gate fullname="All Events\L\P2\cd3\cd8 | cd4" type="OR_Classifier">
<name>cd8 | cd4</name>
...
<input>All Events\L\P2\cd3\cd8</input>
<input>All Events\L\P2\cd3\cd4</input>
</gate>
<gate fullname="All Events\L\P2\cd3\not cd4" type="NOT_Classifier">
<name>not cd4</name>
...
<input>All Events\L\P2\cd3\cd4</input>

There boolean operations are supposed to be indicated by type = "XXX_Classifier" in xml.

However, in the example you provided, these boolean type is missing, instead it is indicating the gate as the regular geometrical gate (i.e. type="Region_Classifier")

<gate fullname="All Events\P1\P2\P3\B cells\IgG+\Ag+" type="Region_Classifier">
<name>Ag+</name>
...
<input>All Events\P1\P2\P3\B cells\IgG+\Ag-</input>
<gate fullname="All Events\P1\P2\P3\Ag+ or plasma cells" type="Region_Classifier">
<name>Ag+ or plasma cells</name>
...
<input>All Events\P1\P2\P3\B cells\IgG+\Ag+</input>
<input>All Events\P1\P2\P3\plasma cells</input>
</gate>

and there aren't actually any region defined in those gates. Even though they do have reference gates in <input> field, but without the proper boolean type info, I am not sure how we can derive the correct boolean gates from it. (I don't think the gate name is the reliable source for that).

I wonder if your example diva xml can even be loaded back to your diva software properly. Or maybe your original unedited version does have these info in place, then the patch should work for it.

Anyway, let me know .

Biomiha commented 4 years ago

Hi Mike,

Yes, after pulling the latest commit the issue is fixed. I changed the gate classifiers in the xml in the hope of fixing the problem and then forgot to change them back when I sent you the xml file. Apologies and very many thanks for adding this feature.

Best wishes.