davismcc / archive-scater

An archived version of the scater repository, see https://github.com/davismcc/scater for the active version.
64 stars 18 forks source link

MAGIC adaptive kernel #106

Closed dvdijk closed 7 years ago

dvdijk commented 7 years ago

I saw that there is a MAGIC implementation in Scater, however it doesn't use the adaptive kernel. The adaptive kernel is important for MAGIC to work well. I haven't coded in R for a long time but I can advise on how to correctly implement MAGIC.

davismcc commented 7 years ago

Hi David

thanks for getting in touch! After reading the paper I was keen to try it out, so coded up that simple implementation. As you rightly point out, it's missing the adaptive kernel.

This R implementation is not "release ready", so I have not exported the MAGIC functions in the latest devel versions of the pkg (i.e. not visible to the user), which will be released in the next Bioconductor release. (I'd also been meaning to get in touch with you, but things got away from me.)

It would be great to have your advice on correctly implementing MAGIC. I think it would be excellent to have an R version available to people, and should enable more people to use your method.

Best Davis

dvdijk commented 7 years ago

Hi Davis,

Here's a Matlab implementation of MAGIC that doesn't require an external diffusion map package:

%% params k = 30; ka = 10; npca = 20; t = 6;

%% MAGIC [U,~,~] = randPCA(M', npca); % fast random svd (http://rbigdata.github.io/documentation/pbdML/rpca.html) %[U,~,~] = svds(M', npca); M_pc = M U; % PCA project D = squareform(pdist(M_pc)); % compute distances knnD = sort(D); % get kNN th = knnD(k+1,:); % find knn cutoff ind_zero = D > repmat(th',1,size(D,2)); % kNN cutoff sigma = knnD(ka+1,:); % locally adaptive sigma D = bsxfun(@rdivide,D,sigma); % adapt distances W = exp(-D.^2); % gaussian kernel W = W + W'; % symmetrize W(ind_zero) = 0; % kNN cutoff L = bsxfun(@rdivide, W, sum(W,2)); % Markov normalization L_t = L^t; % diffuse M_new = L_t M; % impute

%% rescale stuff here (you already have that)

Above code should be fairly easily ported to R. As you can see I'm using a fast random PCA implementation (really a lot faster and very accurate). I found an R implementation, see link in comments. The input data M has cells on the rows and genes on the columns. Let me know if I can further help.

David

rspreafico commented 7 years ago

Pretty cool, it'd be great to have the R equivalent of this implementation available in scater.

dvdijk commented 7 years ago

When I have time I'll try to implement this in R. I'll post it here when I do.

davismcc commented 7 years ago

This issue was moved to davismcc/scater#21