Closed JimboMahoney closed 5 years ago
Hi @JimboMahoney , I'll address the issues in order:
1) Yeah, I understand it's a little frustrating. The choice to use the recursive=TRUE
option in list.files
within diva_to_gatingset
is intentional as it makes it easier for users to have the xml in a top directory and have the FCS files in a nested data directory, without having to explicitly point it there with the path
arg. If that's a real problem, we could maybe allow the user to pass in a value to recursive
that would be honored by list.files
so recursive=FALSE
would only look in the top level. Either way, it's dangerous to silently guess how users would want the issue of duplicate filenames resolved, so it's safer to just error and prompt them resolve it.
2) If you could attach the Diva experiments and FCS files or a location where I could access each, that would help. Until I have a chance to look at it, a few quick things to try/examine:
First, are the gate definitions for the new gates getting read in correctly? Is this successful (and new_gate
looks as expected?:
new_gate <- gs_pop_get_gate(gs, "<new node name>")`
Second, what is the result after recompute(gs)
?
Also, I hope you don't mind if I remove the octothorps from your post to remove the (I assume) unintentional issue cross-linking.
Hi Jacob,
Hah, I didn't even know they were sometimes called octothorps! Cool!
Re: 1 - That's cool, but maybe the error message could be more helpful? My error "cannot find file" (or similar) made me think it wasn't actually finding the file, rather than actually finding multiple copies. I also didn't realise it was recursive.
Re: 2 - I'll give that a go and get back to you.
Thanks!
Experiment.zip Experiment_151035.zip
Hi Jacob,
Thanks for your help. Attached are the two Diva experiments I'm working with.
Luckily they are both really small.
"Experiment" works fine and was the "original" captured on an Aria.
"Experiment 151035" was created by importing a single FCS file from the original "Experiment". That FCS file was created by exporting a subset of cells from the original, using a gate. (I don't know if that is what makes the difference, but it feels like it might?). UPDATE - Actually, that's not it. I opened another experiment acquired on a Canto. The same issue is present there, even if I export all the data. The gates that existed when the experiment was first acquired are populated, but not any gates that I subsequently add in further analysis. The new gates show, but contain no cells.
It looks as I expect in Diva, with all the gates / statistics etc. However, once imported in R, only the first two gates give the correct data ("root" and "T Cells"). The next gates down ("CD8" and "CD4") are weird - one shows nothing and the other has 4 events. They should have 527 / 1224 respectively.
In both cases, I'm using the "Mixed Cells" FCS file and a global worksheet.
gs_pop_get_gate
worked fine and tells me the X/Y parameters and the size/shape of the gate.
recompute
just returns "done!" and doesn't seem to make any difference.
Thanks @JimboMahoney, I'm looking in to it now.
Hi @JimboMahoney . So, I think I found the proximate cause, but I'm still working on the root cause. One way or another, the gate bounds are not getting transformed correctly. This is the CD4 gate that is being parsed in
> g
Rectangular gate 'CD4' with dimensions:
APC-H7-A: (0.0198189485818148,1.535968542099)
BB515-A: (2.0697820186615,3.47838377952576)
which should overlay the CD4+ population here:
But as you can see by the bounds, and the next zoomed out plot, the y dimension of the gate is off so it's landing well above the population.
I have some ideas about what is causing this, but it would help me if you could paste in the output of sessionInfo()
so I can see the versions of all relevant packages.
Here's that particular plot & gate from Diva:
According to Diva, it contains 1224 cells.
Here's my sessionInfo()
:
R version 3.6.1 (2019-07-05) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows >= 8 x64 (build 9200)
Matrix products: default
locale: [1] LC_COLLATE=English_United Kingdom.1252 LC_CTYPE=English_United Kingdom.1252 LC_MONETARY=English_United Kingdom.1252 [4] LC_NUMERIC=C LC_TIME=English_United Kingdom.1252
attached base packages: [1] tcltk stats graphics grDevices utils datasets methods base
other attached packages: [1] tidyr_1.0.0 flowCore_1.51.7 BiocManager_1.30.4 openCyto_1.23.5 flowWorkspace_3.33.9 [6] devtools_2.2.1 usethis_1.5.1 CytoML_1.11.9 svDialogs_1.0.0
loaded via a namespace (and not attached):
[1] ncdfFlow_2.31.3 matrixStats_0.55.0 fs_1.3.1 RColorBrewer_1.1-2 rprojroot_1.3-2 Rgraphviz_2.29.0
[7] backports_1.1.4 tools_3.6.1 R6_2.4.0 KernSmooth_2.23-15 lazyeval_0.2.2 BiocGenerics_0.31.6
[13] colorspace_1.4-1 withr_2.1.2 tidyselect_0.2.5 gridExtra_2.3 prettyunits_1.0.2 mnormt_1.5-5
[19] processx_3.4.1 compiler_3.6.1 graph_1.63.0 cli_1.1.0 Biobase_2.45.1 flowClust_3.23.0
[25] ggcyto_1.13.2 desc_1.2.0 flowStats_3.43.8 svGUI_1.0.0 scales_1.0.0 DEoptimR_1.0-8
[31] hexbin_1.27.3 mvtnorm_1.0-11 robustbase_0.93-5 callr_3.3.2 RBGL_1.61.0 stringr_1.4.0
[37] digest_0.6.21 R.utils_2.9.0 rrcov_1.4-7 base64enc_0.1-3 pkgconfig_2.0.3 sessioninfo_1.1.1
[43] rlang_0.4.0 rstudioapi_0.10 jsonlite_1.6 mclust_5.4.5 gtools_3.8.1 dplyr_0.8.3
[49] R.oo_1.22.0 magrittr_1.5 Matrix_1.2-17 Rcpp_1.0.2 munsell_0.5.0 lifecycle_0.1.0
[55] R.methodsS3_1.7.1 stringi_1.4.3 yaml_2.2.0 MASS_7.3-51.4 zlibbioc_1.31.0 pkgbuild_1.0.5
[61] plyr_1.8.4 grid_3.6.1 parallel_3.6.1 crayon_1.3.4 lattice_0.20-38 splines_3.6.1
[67] zeallot_0.1.0 ps_1.3.0 pillar_1.4.2 fda_2.4.8 corpcor_1.6.9 stats4_3.6.1
[73] pkgload_1.0.2 XML_3.98-1.20 glue_1.3.1 latticeExtra_0.6-28 remotes_2.1.0 data.table_1.12.2
[79] RcppParallel_4.4.4 vctrs_0.2.0 testthat_2.2.1 gtable_0.3.0 purrr_0.3.2 clue_0.3-57
[85] assertthat_0.2.1 ks_1.11.5 ggplot2_3.2.1 IDPmisc_1.1.19 pcaPP_1.9-73 tibble_2.1.3
[91] memoise_1.1.0 flowViz_1.49.2 ellipse_0.4.1 cluster_2.1.0 ellipsis_0.3.0
Apologies if that doesn't paste well - I'm fairly new to both GitHub and R.
PS - two additional comments / questions:
1) When I load CytoML, I get this warning (not sure if this is relevant):
Loading required package: CytoML
Registered S3 method overwritten by 'R.oo':
method from
throw.default R.methodsS3
2) If some packages are screwing things up, how come the gate import works perfectly for the "original" dataset (even when I add more gates to it), but not this file?
Thanks for your help!
Alright. It looks like the transformations for each of the channels for that specimen differ between Cell_Sort.xml
and Experiment_151035.xml
. Using BB414-A
as an example:
Cell_Sort.xml
<tube name="Mixed Cells">
...
<instrument_settings name="Cytometer Settings" template="false">
...
<parameter name="BB515-A" type="30">
<is_log>true</is_log>
<is_quantitated>false</is_quantitated>
<min>0.0</min>
<max>5.4185380935668945</max>
<raw_index>8</raw_index>
<linear_index>12</linear_index>
<log_index>20</log_index>
<final_index>20</final_index>
<fl>BB515</fl>
<voltage>365</voltage>
<target>3</target>
<brightness>1.0</brightness>
<threshold>5000</threshold>
<trigger>false</trigger>
<labels_only>false</labels_only>
<biexp_scale>0</biexp_scale>
<comp_biexp_scale>0</comp_biexp_scale>
<manual_biexp_scale>0</manual_biexp_scale>
<can_be_compensated>true</can_be_compensated>
<compensation>
<compensation_coefficient>0.0</compensation_coefficient>
<compensation_coefficient>1.00003068</compensation_coefficient>
<compensation_coefficient>0.0</compensation_coefficient>
<compensation_coefficient>-0.00166979</compensation_coefficient>
</compensation>
</parameter>
Experiment_151035.xml
<tube name="Mixed Cells">
...
<instrument_settings name="Cytometer Settings" template="false">
...
<parameter name="BB515-A" type="30">
<is_log>true</is_log>
<is_quantitated>false</is_quantitated>
<min>0.0</min>
<max>5.4185380935668945</max>
<raw_index>8</raw_index>
<linear_index>12</linear_index>
<log_index>20</log_index>
<final_index>20</final_index>
<fl>BB515</fl>
<voltage>365</voltage>
<target>3</target>
<brightness>1.0</brightness>
<threshold>5000</threshold>
<trigger>false</trigger>
<labels_only>false</labels_only>
<biexp_scale>43</biexp_scale>
<comp_biexp_scale>44</comp_biexp_scale>
<manual_biexp_scale>0</manual_biexp_scale>
<can_be_compensated>true</can_be_compensated>
<compensation>
<compensation_coefficient>0.0</compensation_coefficient>
<compensation_coefficient>1.0000306827316177</compensation_coefficient>
<compensation_coefficient>0.0</compensation_coefficient>
<compensation_coefficient>-0.0016697859380151786</compensation_coefficient>
</compensation>
</parameter>
I'm still looking in to it to see if we should be changing any part of the logic of diva_to_gatingset
to account for this case or if this is an issue on the export side, but making the transformations on that specimen in Experiment_151035.xml
agree with Cell_Sort.xml
seems to fix the problem. This is the same archive with the patched xml (Experiment_151035_fixed.xml
) included:
Experiment_151035_patched.zip
The stats after diva_to_gatingset
using Experiment_151035_fixed.xml
(minor differences but within reasonable tolerance of the numbers you quoted):
pop count
1: root 2723
2: /T Cells 2043
3: /T Cells/CD4 1218
4: /T Cells/CD4/Naive CD4 589
5: /T Cells/CD4/Memory CD4 587
6: /T Cells/CD8 534
7: /T Cells/CD8/Memory CD8 123
8: /T Cells/CD8/Naive CD8 395
Plots of the gates.
Of course, the scales are still off even if they are now in agreement, so I'll keep investigating and update this based on what I find out.
Thanks for looking into this Jacob.
Almost feels like a Diva bug... It's certainly strange that both files are biexponetially displayed in Diva, but for some reason it's outputting different flags for each.
@JimboMahoney , see/pull 4308271. It looks like it may have been an issue with how the transformations were handled and may be related to https://github.com/RGLab/flowWorkspace/commit/d9a23c36c808358316a57a7f5b72878f07d1feb1 and https://github.com/RGLab/flowWorkspace/commit/63c9eb2168d8119f4f23d6e29bb1ddf6dc679e3b. But anyway, after that simple commit, both of your original experiments parse successfully and the stats seem to even more closely mirror the Diva stats.
pop count
1: root 2723
2: /T Cells 2043
3: /T Cells/CD4 1224
4: /T Cells/CD4/Naive CD4 595
5: /T Cells/CD4/Memory CD4 587
6: /T Cells/CD8 527
7: /T Cells/CD8/Memory CD8 116
8: /T Cells/CD8/Naive CD8 395
Actually, I am going to reopen this while I work on a more robust solution.
Alright @JimboMahoney , I ran down the root of the issue and implemented a better solution in 310e95ed2ac8538a27e5b61870d7bacb8a05d558. It took some adjustments to flowWorkspace
as well, so make sure to pull those changes and rebuild flowWorkspace
before pulling the changes to CytoML
and rebuilding it.
And after RGLab/ggcyto@127877024a2b66be8db3c22f297eb081383dc7ba, the scales of the axes should be good as well, so the plots will properly reflect the scales:
Thanks for all your help on this - as per the other thread, the gates now work as expected!
Actually, the gates are still wrong.
Trying a more complex experiment with more gates and the stats are really different between Diva and CytoML.
Whilst it's not bad in this original dataset, it's very bad for more complex gates.
I've no idea if it's possible to get an exact correlation between Diva gates and CytoML, but I suggest keeping this issue open?
I've added more gates to this small dataset if you'd like me to upload the Diva experiment and the stats shown in Diva to compare?
@JimboMahoney , if you could upload the new experiment, that would help a lot. I'll try to get this fully figured out. It's a bit of a reverse-engineering game.
Hi Jacob,
OK, I've uploaded a single FCS file experiment from the more complex gating strategy, which shows the much larger discrepancies (e.g. the rMono population).
e.g.
openCyto.freq xml.freq openCyto.count xml.count node 1: 1.000000000 0 109925 0 root 2: 0.603702524 0 66362 0 All Cells 3: 0.695367831 0 46146 0 Singlets 4: 0.593615915 0 27393 0 CD45+ 5: 0.105793451 0 2898 0 Ly6G+ 6: 0.413733609 0 1199 0 Neutrophils 7: 0.854889935 0 23418 0 Ly6G- 8: 0.416517209 0 9754 0 CD11cCD11b+ 9: 0.189665778 0 1850 0 DC Populations 10: 0.137071970 0 1337 0 iMono 11: 0.003485749 0 34 0 rMono 12: 0.456742677 0 10696 0 CD11c-CD11b- 13: 0.495044877 0 5295 0 T Cells 14: 0.446896036 0 4780 0 B Cells 15: 0.100136647 0 2345 0 CD11c+CD11b- 16: 0.973987207 0 2284 0 Alveolar Macrophages
Diva shows:
Population | #Events |
---|---|
All Events | 109925 |
All Cells | 66441 |
Singlets | 46681 |
CD45+ | 29426 |
Ly6G+ | 2212 |
Neutrophils | 1863 |
Ly6G- | 27126 |
CD11cCD11b+ | 9954 |
DC Populations | 1686 |
iMono | 1344 |
rMono | 3703 |
CD11c-CD11b- | 12146 |
T Cells | 5992 |
B Cells | 5858 |
CD11c+CD11b- | 4460 |
Alveolar Macrophages | 4332 |
Perhaps it's also interesting to see what happens if I disable compensation of this file in Diva (it becomes somewhat more similar to CytoML's):
Population | #Events |
---|---|
All Events | 109925 |
All Cells | 66441 |
Singlets | 46681 |
CD45+ | 29545 |
Ly6G+ | 3138 |
Neutrophils | 1382 |
Ly6G- | 25134 |
CD11cCD11b+ | 9833 |
DC Populations | 1892 |
iMono | 1411 |
rMono | 38 |
CD11c-CD11b- | 10848 |
T Cells | 5351 |
B Cells | 4903 |
CD11c+CD11b- | 3832 |
Alveolar Macrophages | 3765 |
Thanks @JimboMahoney , I'm looking in to it.
As you highlighted, this is most likely due to the compensation matrix from the XML for that sample not being applied correctly. Working on a fix.
Alright @JimboMahoney , it should be good after 901a222c98588c2e5e592780dafff894657f3947. Try pulling that and making sure it works. This is what I'm getting for stats:
pop count
1: root 109925
2: /All Cells 66362
3: /All Cells/Singlets 46146
4: /All Cells/Singlets/CD45+ 29091
5: /All Cells/Singlets/CD45+/Ly6G+ 2183
6: /All Cells/Singlets/CD45+/Ly6G+/Neutrophils 1850
7: /All Cells/Singlets/CD45+/Ly6G- 26801
8: /All Cells/Singlets/CD45+/Ly6G-/CD11cCD11b+ 9906
9: /All Cells/Singlets/CD45+/Ly6G-/CD11cCD11b+/DC Populations 1667
10: /All Cells/Singlets/CD45+/Ly6G-/CD11cCD11b+/iMono 1331
11: /All Cells/Singlets/CD45+/Ly6G-/CD11cCD11b+/rMono 3661
12: /All Cells/Singlets/CD45+/Ly6G-/CD11c-CD11b- 12110
13: /All Cells/Singlets/CD45+/Ly6G-/CD11c-CD11b-/T Cells 5959
14: /All Cells/Singlets/CD45+/Ly6G-/CD11c-CD11b-/B Cells 5849
15: /All Cells/Singlets/CD45+/Ly6G-/CD11c+CD11b- 4198
16: /All Cells/Singlets/CD45+/Ly6G-/CD11c+CD11b-/Alveolar Macrophages 4069
which is much closer to the Diva values. I'm going to close this, but feel free to comment/open it/start another issue if you come across anything else.
Thanks Jacob - but I'm afraid I don't know how to "pull" that!
I'm a novice Github and R user and all I know is that the following works on my system:
BiocManager::install("CytoML", version = "devel")
But the following doesn't:
install_github("RGLab/CytoML", ref = "trunk")
From what I can tell, getting the second one to work on a Windows system is non-trivial (I had hoped that installing the xml2 library would work, but it doesn't). i.e. see #65
I guess I'll have to wait until this gets pushed over to bioconductor.
Sorry @JimboMahoney , my fault. It's a bad habit from talking to people who are installing from the github repo. I moved it over to Bioconductor with a version bump, but it will take a day or so for that to be available via BiocManager::install("CytoML", version = "devel")
. Additionally, the Bioconductor 3.10 release is in two days, so this will be in 3.10 as well.
Awesome - just updated from bioconductor and it's much better now, thanks!
Hi there!
Firstly, MANY thanks for this code! I've used it successfully to compare manual gates with automated clustering.
However, I've found a couple of issues, one of which I can work around and one I cannot.
Here is my code:
ws <- open_diva_xml(file)
diva_get_sample_groups(ws)
gs <- diva_to_gatingset(ws, name = 1, subset = 1, worksheet="global")
There is also this, but I'm not entirely sure it's necessary:
gh <- gs[[1]]
Issue 1:
The third line "breaks" if the directory containing the XML / FCS files contains subdirectories with files of the same name. For some reason, it also looks in the subdirectories, finds files with the same name and then reports that it cannot open the specified FCS file.
At first I was stumped, but after removing the subdirectories, it opens the file OK.
Issue 2:
I'm not sure if this is an issue with Diva (I'm using 8.0.2) or with CytoML. I'll try and explain.
I have an experiment with the original FCS files captured in Diva. This opens fine, as well as all gates and cell populations.
However, I get very strange results if I do the following:
1) Export a subset of that experiment (e.g. after doing basic gating, such as lymphocytes and singlets) to an FCS file (so that I have only the cells I'm interested in). 2) Open that FCS file as a new experiment in Diva. 3) Set my gates. Looks good in Diva, with all stats being as I expect. 4) Export the experiment. 5) Import using CytoML.
The gates look good using this code:
plot(gs)
and also this:
gs_get_pop_paths(gh,path="auto")
However, the stats are only correct for the first two gates:
gh_pop_compare_stats(gh)
Any further gates are unpopulated, except for one that only contains 4 events (it should contain 598).
In addition, the plotting of the graphs doesn't work:
plotGate(gh)
It gives the following error:
"Error in valid.viewport(x, y, width, height, just, gp, clip, xscale, yscale, : invalid 'yscale' in viewport"
And a bunch of the following warnings:
"3: In min(x) : no non-missing arguments to min; returning Inf 4: In max(x) : no non-missing arguments to max; returning -Inf"
Any ideas on what's going on would be appreciated! Let me know if you want / need the Diva experiments (they're only small).
Thanks!