AJResearchGroup / nsphs_ml_qt

R package for nsphs_ml_qt
GNU General Public License v3.0
0 stars 1 forks source link

[RUNNING] What does GCAE converge to? Why to a NMSE of 1? #61

Open richelbilderbeek opened 2 years ago

richelbilderbeek commented 2 years ago

Currently, all runs converge to a normalized mean squared error of one. Ideally, this error should be zero.

These are both from run #50:

nmse_in_time

nmse_in_time

Note that sometimes NMSE goes down, see the top-left panel of run #42:

issue_42_nmses_in_time

It feels worth exploring if maybe GCAE is trained to have a NMSE of 1, i.e. its highest fitness is when NMSE is 1, instead of 0.

richelbilderbeek commented 2 years ago

Here I find some promising code:


    if phenomodel is not None:
        with tf.GradientTape() as g6:
            loss_value = tf.constant(0.)
            for output, encoded_data in (model(input, targets, is_training=True, regloss=False),) + ((model2(input, targets, is_training=True, regloss=False), ) if do_two else ()):
                phenopred, _ = phenomodel(encoded_data, is_training=True)
                #tf.print("PRED") # RJCB
                #tf.print(phenopred) # RJCB
                #tf.print(phenotargets) # RJCB
                loss_value += tf.math.reduce_sum(tf.square(phenopred - phenotargets)) * 1e-2

        gradientspheno = g6.gradient(loss_value, allvars)
        phenoloss = loss_value

    loss_value = orig_loss
richelbilderbeek commented 2 years ago

The phenoloss appears never to be used relevantly:

richel@N141CU:~/GitHubs/gcae$ egrep -R "phenoloss" --include=*.py
run_gcae.py:        phenoloss = loss_value
run_gcae.py:        phenoloss, phenoalpha = (0.,0.)
run_gcae.py:    tf.print(loss_value, other_loss4, other_loss3, other_loss2, other_loss, phenoloss, full_loss, alpha4, alpha3, alpha2, alpha, phenoalpha)

It seems the gradientspheno is more relevant:

    def combine(gradients, gradients2):
        [...]
    if phenomodel is not None:
        gradients3, phenoalpha = combine(gradients3, gradientspheno)
    else:
        phenoloss, phenoalpha = (0.,0.)
        [...]
    tf.print(loss_value, other_loss4, other_loss3, other_loss2, other_loss, phenoloss, full_loss, alpha4, alpha3, alpha2, alpha, phenoalpha)
    return loss_value
richelbilderbeek commented 2 years ago

As an error function, the squared errors are used.

However, most work is done by the gradients, e.g. gradients3, phenoalpha = combine(gradients3, gradientspheno). I cannot verify that combine works correctly.

I will add the amazement of NMSE converging to 1 to the article.

richelbilderbeek commented 2 years ago

I checked by copying the values from phenotype_predictions.csv and doing the same in Calc. Also here, the error =(C871-D871)^2, then taking the average of the sum =SUM(F2:F871)/870 resulted in the same value.

richelbilderbeek commented 2 years ago

I have checked that the phenotype file is correct ...

[richel@sens2021565-bianca data_issue_50_100]$ ls *.phe
data_issue_50_100.phe

I have checked the predicted values are correct

[richel@sens2021565-bianca data_issue_50_100]$ cat 10000.phe

So, the input is correct.

richelbilderbeek commented 2 years ago

I have made a debug .sif, running it for setting 5:

[richel@sens2021565-bianca nsphs_ml_qt]$ mv nsphs_ml_qt_debug.sif nsphs_ml_qt.sif
[richel@sens2021565-bianca nsphs_ml_qt]$ cd ..
[richel@sens2021565-bianca ~]$ ls
98_clean_bianca.sh  bin  n_jobs.txt  nsphs_ml_qt  nsphs_ml_qt_results  README.md  richel-sens2021565  script.R
[richel@sens2021565-bianca ~]$ ./nsphs_ml_qt/scripts_bianca/
20_start_issue_28.sh         21_create_issue_5_params.sh  29_zip_issue_42.sh
20_start_issue_29.sh         22_create_issue_28_data.sh   29_zip.sh
20_start_issue_42_again.sh   22_create_issue_29_data.sh   91_poll_jobs.sh
20_start_issue_42.sh         22_create_issue_42_data.sh   92_poll_n_jobs.sh
20_start_issue_50.sh         22_create_issue_50_data.sh   98_clean_bianca.sh
20_start_issue_5.sh          22_create_issue_5_data.sh    
[richel@sens2021565-bianca ~]$ ./nsphs_ml_qt/scripts_bianca/20_start_issue_5.sh 
Starting time: 2022-07-04T09:43:36+0200
Running on computer with HOSTNAME: sens2021565-bianca.uppmax.uu.se
Running at location /home/richel
gcae_experiment_params_filename: /proj/sens2021565/nobackup/nsphs_ml_qt_results/data_issue_5_1/experiment_params.csv
unique_id: issue_5
jobid_21: 28771
jobid_22: 28772
jobid_24: 28773
jobid_25: 28774
jobid_26: 28775
jobid_29: 28776
End time: 2022-07-04T09:43:37+0200
Duration: 1 seconds

Goals is to detect a negative predicted value in the 25_ logs.

richelbilderbeek commented 2 years ago

Need to increase the verbosity then in gcaer as well...

richelbilderbeek commented 2 years ago

Added verbose:

[richel@sens2021565-bianca ~]$ nano nsphs_ml_qt/scripts_bianca/21_create_issue_5_params.R
[richel@sens2021565-bianca ~]$ cat nsphs_ml_qt/scripts_bianca/21_create_issue_5_params.R
# message("Running on: ", uppmaxr::get_where_i_am())

args <- commandArgs(trailingOnly = TRUE)
# ...

gcae_experiment_params <- gcaer::create_gcae_experiment_params(
  gcae_setup = gcae_setup,
  gcae_options = gcae_options,
  analyse_epochs = seq(10, 50, by = 20),
  metrics = "",
  verbose = TRUE
)
gcaer::save_gcae_experiment_params(
  gcae_experiment_params = gcae_experiment_params,
  gcae_experiment_params_filename = gcae_experiment_params_filename
)
# ...

message("Really saved 'gcae_experiment_params' at ", gcae_experiment_params_filename)
richelbilderbeek commented 2 years ago

There goes:

[richel@sens2021565-bianca ~]$ ./nsphs_ml_qt/scripts_bianca/20_start_issue_5.sh 
Starting time: 2022-07-04T11:32:16+0200
Running on computer with HOSTNAME: sens2021565-bianca.uppmax.uu.se
Running at location /home/richel
gcae_experiment_params_filename: /proj/sens2021565/nobackup/nsphs_ml_qt_results/data_issue_5_1/experiment_params.csv
unique_id: issue_5
jobid_21: 28779
jobid_22: 28780
jobid_24: 28781
jobid_25: 28782
jobid_26: 28783
jobid_29: 28784
End time: 2022-07-04T11:32:17+0200
Duration: 1 seconds
richelbilderbeek commented 2 years ago

Folders are clean:

[richel@sens2021565-bianca ~]$ cd nsphs_ml_qt_results/
[richel@sens2021565-bianca nsphs_ml_qt_results]$ ls
[richel@sens2021565-bianca nsphs_ml_qt_results]$ ls
richelbilderbeek commented 2 years ago

Hmmm, failed to put it in the Singularity container :-/

Singularity> cat run_gcae.py | egrep RJCB
                #tf.print("PRED") # RJCB
                #tf.print(phenopred) # RJCB
                #tf.print(phenotargets) # RJCB
            phenodata = readpheno(data_prefix + ".phe", 0) # RJCB: Use first phenotype
Singularity> 
richelbilderbeek commented 2 years ago

Alrighty, I am going to use another approach, using simulated data.

richelbilderbeek commented 2 years ago

Running with bigger values now, as the result was yet inconclusive.

Screenshot from 2022-07-04 17-19-26

richelbilderbeek commented 2 years ago

It seems more GenoCAE struggles with larger values ...?

Screenshot from 2022-07-04 19-25-05

richelbilderbeek commented 2 years ago

This test made the last picture:

test_that("use", {
  # If running with a simple additive trait, it should
  # not matter much if the phenotype value is -100, 100 or 200.
  #
  base_phenotype_values <- c(-100, 100, 200)
  results_list <- list()
  for (i in seq_along(base_phenotype_values)) {
    message(i)
    base_phenotype_value <- base_phenotype_values[i]
    assoc_qt_data <- plinkr::create_demo_assoc_qt_data(
      n_individuals = 100,
      traits = plinkr::create_additive_trait(
        mafs = 0.49,
        n_snps = 1,
        base_phenotype_value = base_phenotype_value,
        phenotype_increase = 0.5
      )
    )
    assoc_qt_data$data <- plinkr::convert_plink_text_data_to_plink_bin_data(
      assoc_qt_data$data
    )
    datadir <- paste0(
      file.path(gcaer::get_gcaer_tempfilename(), "issue_61"),
      "/"
    )
    data <- "issue_61_data"
    base_input_filename <- paste0(datadir, data)
    plinkr::save_plink_bin_data(
      plink_bin_data = assoc_qt_data$data,
      base_input_filename = base_input_filename
    )
    plinkr::save_phe_table(
      phe_table = assoc_qt_data$phenotype_data$phe_table,
      phe_filename = paste0(base_input_filename, ".phe")
    )

    gcae_experiment_params <- gcaer::create_gcae_experiment_params(
      gcae_setup = gcaer::create_test_gcae_setup(
        datadir = datadir,
        data = data,
        superpops = "",
        model_id = "M1",
        pheno_model_id = "p1"
      ),
      analyse_epochs = c(100, 200, 400, 800, 1600, 3200),
      metrics = ""
    )
    results <- gcaer::do_gcae_experiment(gcae_experiment_params)

    results$nmse_in_time_table$base_phenotype_value <- base_phenotype_value
    results_list[[i]] <- results$nmse_in_time_table
  }
  t <- dplyr::bind_rows(results_list)
  t$base_phenotype_value <- as.factor(t$base_phenotype_value)
  ggplot2::ggplot(
    t,
    ggplot2::aes(x = epoch, y = nmse, color = base_phenotype_value)
  ) + ggplot2::geom_line()

  ggplot2::ggsave("~/issue_61.png", width = 7, height = 7)
})
richelbilderbeek commented 2 years ago

I am going to run this on Rackham, to be able to see a clear difference. AFAICS, GCAE can handle negative numbers fine.

richelbilderbeek commented 2 years ago

Running:

[richel@rackham2 ~]$ sbatch nsphs_ml_qt/scripts_local/issue_61/start_issue_61.sh 
Submitted batch job 28166624
[richel@rackham2 ~]$ squeue -u $USER
             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
          28166624      core issue_61   richel  R       0:00      1 r1224
richelbilderbeek commented 2 years ago

Next to that, just describe this clearly in https://github.com/AJResearchGroup/richel/issues/186

richelbilderbeek commented 2 years ago

Error:

[richel@rackham2 ~]$ cat issue_61.log
base_input_filename: 
n_individuals: 
n_traits: 
n_snps_per_trait: 
singularity_filename: 
Starting time: 2022-07-05T10:12:30+0200
Running on computer with HOSTNAME: r1224
Running at location /home/richel
'nsphs_ml_qt.sif' running with arguments 'Rscript nsphs_ml_qt/scripts_local/issue_61/issue_61.R'
1 2022-07-05 10:12:42
Error in gcaer::check_input_files_are_present(gcae_experiment_params) : 
  Autoencoder architecture file absent. 
gcae_experiment_params$gcae_setup$model_id: M1
model_filename: /home/richel/.local/share/GenoCAE/models/M1.json 
gcae_experiment_params$gcae_options$gcae_folder: ~/.local/share/GenoCAE 
Tip 1: maybe use a different 'model_id', 'model_id's available are {} 
Tip 2: maybe use a different 'gcae_experiment_params$gcae_options$gcae_folder'
Calls: <Anonymous> -> <Anonymous>
Execution halted
End time: 2022-07-05T10:12:46+0200
Duration: 16 seconds

Fixed and re-run:

[richel@rackham2 ~]$ sbatch nsphs_ml_qt/scripts_local/issue_61/start_issue_61.sh 
Submitted batch job 28166649
[richel@rackham2 ~]$ squeue -u $USER
             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
          28166649      core issue_61   richel  R       0:06      1 r1224
richelbilderbeek commented 2 years ago
[richel@rackham2 ~]$ cat issue_61.log
base_input_filename: 
n_individuals: 
n_traits: 
n_snps_per_trait: 
singularity_filename: 
Starting time: 2022-07-05T10:27:01+0200
Running on computer with HOSTNAME: r1224
Running at location /home/richel
'nsphs_ml_qt.sif' running with arguments 'Rscript nsphs_ml_qt/scripts_local/issue_61/issue_61.R'
1 2022-07-05 10:27:11
richelbilderbeek commented 2 years ago
[richel@rackham2 ~]$ cat issue_61.log
# ...
1 2022-07-05 10:27:11
2 2022-07-05 10:32:09
3 2022-07-05 10:36:37
4 2022-07-05 10:41:04
[richel@rackham2 ~]$ ./nsphs_ml_qt/scripts_bianca/91_poll_jobs.sh 
             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
          28166649      core issue_61   richel  R      16:38      1 r1224
richelbilderbeek commented 2 years ago

Typo in script (only measure at 1k epochs, now every 1k until 10k), restart:

[richel@rackham3 richel]$ sbatch nsphs_ml_qt/scripts_local/issue_61/start_issue_61.sh 
Submitted batch job 28170206
[richel@rackham3 richel]$ ./nsphs_ml_qt/scripts_bianca/91_poll_jobs.sh 
             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
          28170206      core issue_61   richel  R       0:05      1 r1041
richelbilderbeek commented 2 years ago

I would say: GCAE does not converge:

issue_61.csv issue_61

What I did here:

Instead that the error converges to zero, it does converges to something else.

richelbilderbeek commented 2 years ago

Video: download (.ogv) YouTube