DillonHammill / CytoExploreR

Interactive Cytometry Data Analysis
60 stars 13 forks source link

Best practices Query - Accessing .fcs data over a VPN #157

Open rwbaer opened 1 year ago

rwbaer commented 1 year ago

Briefly describe what you hope to achieve: I am struggling to optimize processing across a VPN connection.

I have data saved on a work (university) network resource. I access this resource via a VPN (cisco) as a mapped network drive on Windows 11 (I9 with 64 GB). Although the analysis can be done, it procedes at a snails pace compared to running within the work firewall. Processing is slow enought that RStudio can think it has lost the connection. I suspect that a part of this is inefficient authentication when accessing files but probably also includes data transfer overhead.

My question is how gating sets and corresponding framesets/cytosets are accessed across connections? Are data located in memory or temporary files on the processing computer or are they located in the remote data directory? It is appearing to me that perhaps there are "temp files" that are created in the remote directory of origin rather than on the processing computer. If this is true, is there a way to speed up processing using local memory or at least local temp files?

As an alternative, is there a better way (best proactices/recommendation) to approach analysis by breaking it up into initial processing to create a gating set and then saving the "processed gating set locally" to use for graphing and data analysis? Because transformation is "lost" or "lost track of" this process would seem to require some thought and care. Should raw data files always be moved to the processing computer prior to processing and then any results moved back to the network storage location at the end? Any recommendations? It would help to understand what information is saved where in CytoExploreR processing. Are cytosets in memory or local? How about cyto_setup() information; is that memory resident?