Closed SamGG closed 3 years ago
Hi Samuel,
The NewData function is currently meant for this, it calls the MapDataToCodes function internally but also preprocesses your data in the same way as the original data to make sure things match (unless you specify otherwise in your parameters). Does this help or does this not allow you to do exactly what you want? Adding some extra checks is always a good idea, I'll have a look into the ones you are proposing.
All the best, Sofie
On Tue, 1 Dec 2020 at 09:13, Samuel Granjeaud notifications@github.com wrote:
Hi Sofie,
I am going to use MapDataToCodes to do an "upscaling" in order to assign cluster id to cells that have been left out during the downsampling when analyzing large sets. If I understand correctly the aim of this function, it maps cells to codes using the 1st nearest neighbor. Do you plan to export this function ? This could be useful in your pipeline when dealing with large datasets. Of course, there are many alternatives to achieve this goal but MapDataToCodes is a correct one. It probably could be parallelized as well.
If you think about exporting it, what about adding some checks before calling C code. I forgot to add colnames and crashed my R session :-(
https://github.com/SofieVG/FlowSOM/blob/7c78eb1cfe662eaf8c78264de025dc1d50a45cf4/R/2_buildSOM.R#L202
Here are some checks I think about, but absolutely untested.
if (is.null(colnames(codes)) stop("Columns of codes must have names.") if (is.null(colnames(newdata)) stop("Columns of newdata must have names.") if (setdiff(colnames(codes), colnames(newdata)) stop("Colnames of codes must be present in newdata.")
Let me know, Samuel
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/SofieVG/FlowSOM/issues/38, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAOS725LOOBAVTGDDMWTGJ3SSSQTZANCNFSM4UIVLANA .
Thanks for your quick reply. NewData sounds great. My thought is oriented towards matrices of your expression with all transformation already done, so I have to think to fit into NewData, but it definitively sounds great. Sorry to waste your time, I should have read the NEWS before posting. I should update my knowledge about the FlowSOM workflow. All the best, Samuel
My brain is working slowly this morning. In fact, I want to transfer all information (cluster id, but also UMAP coordinates...) from a set of analyzed cells to a larger set of cells. As MapDataToCodes is returning an index, it sounds more interesting for this aim. Let me know about your opinion, Samuel
Feel free to reopen the issue to keep track of this idea. All the best.
Hi Sofie,
I am going to use MapDataToCodes to do an "upscaling" in order to assign cluster id to cells that have been left out during the downsampling when analyzing large sets. If I understand correctly the aim of this function, it maps cells to codes using the 1st nearest neighbor. Do you plan to export this function ? This could be useful in your pipeline when dealing with large datasets. Of course, there are many alternatives to achieve this goal but MapDataToCodes is a correct one. It probably could be parallelized as well.
If you think about exporting it, what about adding some checks before calling C code. I forgot to add colnames and crashed my R session :-(
https://github.com/SofieVG/FlowSOM/blob/7c78eb1cfe662eaf8c78264de025dc1d50a45cf4/R/2_buildSOM.R#L202
Here are some checks I think about, but absolutely untested.
Let me know, Samuel