AIH-SGML / mixmil

Code for the paper: Mixed Models with Multiple Instance Learning
https://arxiv.org/abs/2311.02455
Apache License 2.0
13 stars 0 forks source link

Very long waiting time for the init setep #12

Closed HelloWorldLTY closed 3 weeks ago

HelloWorldLTY commented 1 month ago

Hi, I found that the initial step of this method is superly long for me. I have been waiting for one hour. Any suggestions here? Are they related to data scales? I have 276400 instances for training here.

jan-engelmann commented 1 month ago

Hi, Can you provide more detail?

If you are standardizing the whole dataset that can take a while.

Profiling your code would help you figure out where most of the compute time is spent.

Cheers!

HelloWorldLTY commented 4 weeks ago

Hi, thanks. I have ~ 2000 bags, but 28000+ features. Do you think the number of features will be a big problem here?

Also, I cannot fit my current model with one A100 80GB GPU.

jan-engelmann commented 3 weeks ago

Hi! in the paper we argue for unsupervised representation learning to reduce the dimensionality. What type of data is this? If it's sc-RNAseq I'd recommend for example scVI to reduce the dimensionality first.

HelloWorldLTY commented 3 weeks ago

Ok, thanks. I tried to reduce the dimensions to 256 dims but still had the same issue. Maybe I can look for other tools for dimension reduction.