Exporting MapDataToCodes function

SamGG commented 3 years ago

Hi Sofie,

I am going to use MapDataToCodes to do an "upscaling" in order to assign cluster id to cells that have been left out during the downsampling when analyzing large sets. If I understand correctly the aim of this function, it maps cells to codes using the 1st nearest neighbor. Do you plan to export this function ? This could be useful in your pipeline when dealing with large datasets. Of course, there are many alternatives to achieve this goal but MapDataToCodes is a correct one. It probably could be parallelized as well.

If you think about exporting it, what about adding some checks before calling C code. I forgot to add colnames and crashed my R session :-(

https://github.com/SofieVG/FlowSOM/blob/7c78eb1cfe662eaf8c78264de025dc1d50a45cf4/R/2_buildSOM.R#L202

Here are some checks I think about, but absolutely untested.

if (is.null(colnames(codes)) stop("Columns of codes must have names.")
if (is.null(colnames(newdata)) stop("Columns of newdata must have names.")
if (setdiff(colnames(codes), colnames(newdata)) stop("Colnames of codes must be present in newdata.")

Let me know, Samuel

SofieVG commented 3 years ago

Hi Samuel,

The NewData function is currently meant for this, it calls the MapDataToCodes function internally but also preprocesses your data in the same way as the original data to make sure things match (unless you specify otherwise in your parameters). Does this help or does this not allow you to do exactly what you want? Adding some extra checks is always a good idea, I'll have a look into the ones you are proposing.

All the best, Sofie

On Tue, 1 Dec 2020 at 09:13, Samuel Granjeaud notifications@github.com wrote:

Hi Sofie,

I am going to use MapDataToCodes to do an "upscaling" in order to assign cluster id to cells that have been left out during the downsampling when analyzing large sets. If I understand correctly the aim of this function, it maps cells to codes using the 1st nearest neighbor. Do you plan to export this function ? This could be useful in your pipeline when dealing with large datasets. Of course, there are many alternatives to achieve this goal but MapDataToCodes is a correct one. It probably could be parallelized as well.

If you think about exporting it, what about adding some checks before calling C code. I forgot to add colnames and crashed my R session :-(

https://github.com/SofieVG/FlowSOM/blob/7c78eb1cfe662eaf8c78264de025dc1d50a45cf4/R/2_buildSOM.R#L202

Here are some checks I think about, but absolutely untested.

if (is.null(colnames(codes)) stop("Columns of codes must have names.") if (is.null(colnames(newdata)) stop("Columns of newdata must have names.") if (setdiff(colnames(codes), colnames(newdata)) stop("Colnames of codes must be present in newdata.")

Let me know, Samuel

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/SofieVG/FlowSOM/issues/38, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAOS725LOOBAVTGDDMWTGJ3SSSQTZANCNFSM4UIVLANA .

SamGG commented 3 years ago

Thanks for your quick reply. NewData sounds great. My thought is oriented towards matrices of your expression with all transformation already done, so I have to think to fit into NewData, but it definitively sounds great. Sorry to waste your time, I should have read the NEWS before posting. I should update my knowledge about the FlowSOM workflow. All the best, Samuel

SamGG commented 3 years ago

My brain is working slowly this morning. In fact, I want to transfer all information (cluster id, but also UMAP coordinates...) from a set of analyzed cells to a larger set of cells. As MapDataToCodes is returning an index, it sounds more interesting for this aim. Let me know about your opinion, Samuel

SamGG commented 3 years ago

Feel free to reopen the issue to keep track of this idea. All the best.

SofieVG / FlowSOM

Exporting MapDataToCodes function #38