KrishnaswamyLab / Multiscale_PHATE

Creating multi-resolution embeddings and clusters from high dimensional data
GNU General Public License v3.0
47 stars 11 forks source link

Suggestion for parameters #3

Closed mdmanurung closed 3 years ago

mdmanurung commented 3 years ago

Dear authors,

I have tried using this package on CyTOF data of ~ 3 million cells but it took such a long time. Do you have any recommendation or rule of thumbs for the parameters?

Thanks in advance.

Mikhael

mkuchroo commented 3 years ago

Hi Mikhael,

Good to hear you are using our algorithm!

I would recommend you do the following things:

  1. Make sure the MSP package you are using is most up to date (we recently pushed a newer and faster implementation to git)
  2. MSP becomes increasingly fast when you add additional parallel threads. Typically, I run it with 10 threads (n_jobs=10). If your system can handle this I would recommend.
  3. How many features are you working with? We tested our algorithm most extensively on flow data which has 16 features. Having many more features would slow down the initial coarse graining step somewhat (though not substantially). This would mean that you probably wouldn't be able to embed 3 million cells in 7 minutes (which is what I am able to achieve with 10 threads on flow data) but hopefully not too much longer than that (depending on number of features obviously).

Please let me know if this is helpful and if you have any other questions!

Manik

On Thu, Dec 10, 2020 at 8:37 AM manurungmd notifications@github.com wrote:

Dear authors,

I have tried using this package on CyTOF data of ~ 3 million cells but it took such a long time. Do you have any recommendation or rule of thumbs for the parameters?

Thanks in advance.

Mikhael

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/KrishnaswamyLab/Multiscale_PHATE/issues/3, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAWDLTC7B4M6OKZXNG6YQFTSUDFIVANCNFSM4UVA52MA .

mdmanurung commented 3 years ago

Hi Manik,

Thank you for the suggestion! Setting n_jobs did the trick. I thought it was set to -1 by default.

Could you share some example on showing and picking the resolution of the clusters as well as an integrated workflow with MELD?

Thanks in advance. I am closing this issue.