epaaso / sc-luca-explore

Exploring the luca dataset for building coabundance networks
0 stars 0 forks source link

Ideal hyperparams scArches #1

Open epaaso opened 1 year ago

epaaso commented 1 year ago
epaaso commented 11 months ago

From https://www.sc-best-practices.org/

The model will be trained for a given number of epochs, a training iteration where every cell is passed through the network. By default scVI uses the following heuristic to set the number of epochs. For datasets with fewer than 20,000 cells, 400 epochs will be used and as the number of cells grows above 20,000 the number of epochs is continuously reduced. The reasoning behind this is that as the network sees more cells during each epoch it can learn the same amount of information as it would from more epochs with fewer cells.

Implement this in your notebooks with: max_epochs_scvi = np.min([round((20000 / adata.n_obs) * 400), 400]) max_epochs_scvi

With our 400,000 datasests the epochos result to be only 20... this does not achieve optimum accuracy.

epaaso commented 3 months ago

From https://docs.scarches.org/en/latest/training_tips.html. This is where the latent dimensions are recommended:

Regarding architecture always try with the default one ([128,128], z_dimension=10) and check the results. If you have more complicated data sets with many datasets and conditions and etc then you can increase the depth ([128,128,128] or [128,128,128,128]). According to our experiments, small values of z_dimension between 10 (default) and 20 are good.

epaaso commented 2 months ago

Using 3 layers and training for 900 epochs instead of 300 epochs I managed to get 90% accuracy instead of 71%. I think it was also due to stopping and starting again every 300 epochs as the learning rate may have a gamma distribution scheduler for the learning rate.

Nevertheless this still predicted the cells in Zuani dataset very wrong. Now I will check if it predicts them wrong in Deng dataset again.

It predicted wrong in Deng, but because we were only training on tumor cells.

epaaso commented 1 month ago

Also consider that the HCLA atlas did not correct for sample, as they wanted to maintain variability. We are not coreccting for sample, but for dataset.