bensutherland / simple_pop_stats

A short analysis of population statistics given specific inputs
5 stars 1 forks source link

Function not operating correctly until sourced again manually #34

Closed bensutherland closed 3 weeks ago

bensutherland commented 1 year ago

It is not clear what is causing this, but when I source the initiator script for simple_pop_stats (i.e., 01_scripts/simple_pop_stats_start.R), then run my analysis, the function simple_pop_stats_pyes/01_scripts/utilities/pca_from_genind.r does not properly use the variable retain_pca_obj = TRUE. More specifically, second and third uses of this variable in if statements within the function do not use the variable. The only way around this that I've found is re-sourcing the function itself manually a second time after the initiator script. The it handles the variable above correctly. I have no idea why this occurs, but I have found it multiple times. It does make me concerned that the standard way of sourcing functions is somehow not providing the full and correct function.

Initial run with using the initiator script only:

> pca_from_genind(data = obj
+                 , PCs_ret = 4
+                 , plot_eigen = TRUE
+                 , plot_allele_loadings = TRUE
+                 , retain_pca_obj = TRUE
+                 , colour_file = "00_archive/formatted_cols.csv"
+                 )
[1] "Converting genind to genlight"
Starting gi2gl 
Starting gl.compliance.check 
  Processing genlight object with SNP data
  Checking coding of SNPs
    SNP data scored NA, 0, 1 or 2 confirmed
  Checking locus metrics and flags
  Recalculating locus metrics
  Checking for monomorphic loci
    No monomorphic loci detected
  Checking for loci with all missing data
    No loci with all missing data detected
  Checking whether individual names are unique.
  Checking for individual metrics
  Warning: Creating a slot for individual metrics
  Checking for population assignments
    Population assignments confirmed
  Spelling of coordinates checked and changed if necessary to 
            lat/lon
Completed: gl.compliance.check 
Completed: gi2gl 
[1] "Executing PCA, retaining 4 PCs"
[1] "Keeping pca.obj in global enviro"
[1] "Writing out per sample PC loading values"
[1] "Using custom colours file from 00_archive/formatted_cols.csv"                           
[1] "Plotting eigenvalues"
[1] "Plotting allele loadings"
RStudioGD 
        2 

Note that there is no saving out.

Then I manually sourced the pca_from_genind function, and re-run

pca_from_genind(data = obj
                 , PCs_ret = 4
                 , plot_eigen = TRUE
                 , plot_allele_loadings = TRUE
                 , retain_pca_obj = TRUE
                 , colour_file = "00_archive/formatted_cols.csv"
                 )

outputs Saving a PCA plot as 'pc1_v_pc2.plot' into the enviro in the readout and in the global environment:

[1] "Converting genind to genlight"
Starting gi2gl 
Starting gl.compliance.check 
  Processing genlight object with SNP data
  Checking coding of SNPs
    SNP data scored NA, 0, 1 or 2 confirmed
  Checking locus metrics and flags
  Recalculating locus metrics
  Checking for monomorphic loci
    No monomorphic loci detected
  Checking for loci with all missing data
    No loci with all missing data detected
  Checking whether individual names are unique.
  Checking for individual metrics
  Warning: Creating a slot for individual metrics
  Checking for population assignments
    Population assignments confirmed
  Spelling of coordinates checked and changed if necessary to 
            lat/lon
Completed: gl.compliance.check 
Completed: gi2gl 
[1] "Executing PCA, retaining 4 PCs"
[1] "Keeping pca.obj in global enviro"
[1] "Writing out per sample PC loading values"
[1] "Using custom colours file from 00_archive/formatted_cols.csv"                           
[1] "Saving a PCA plot as 'pc1_v_pc2.plot' into the enviro"
[1] "Saving a PCA plot as 'pc3_v_pc4.plot' into the enviro"
[1] "Plotting eigenvalues"
[1] "Plotting allele loadings"
RStudioGD 
        2

The only difference between these two runs is the manual source of the function.

erondeau commented 1 year ago

Hey Ben, I can't duplicate the error:

#Source simple_pop_stats.r
#Select species (Coho in my case)
#Load genepop
load_genepop()

#update names
 update_pop_names(sep_by = "collection", name_by = "stockname")

 #Run PCA script as above, but without colour file
 pca_from_genind(data = obj
                           , PCs_ret = 4
                          , plot_eigen = TRUE
                          , plot_allele_loadings = TRUE
                          , retain_pca_obj = TRUE
)
[1] "Converting genind to genlight"
Starting gi2gl 
Starting gl.compliance.check 
  Processing genlight object with SNP data
  Checking coding of SNPs
    SNP data scored NA, 0, 1 or 2 confirmed
  Checking locus metrics and flags
  Recalculating locus metrics
  Checking for monomorphic loci
    Dataset contains monomorphic loci
  Checking for loci with all missing data
    No loci with all missing data detected
  Checking whether individual names are unique.
  Checking for individual metrics
  Warning: Creating a slot for individual metrics
  Checking for population assignments
    Population assignments confirmed
  Spelling of coordinates checked and changed if necessary to 
            lat/lon
Completed: gl.compliance.check 
Completed: gi2gl 
[1] "Executing PCA, retaining 4 PCs"
[1] "Keeping pca.obj in global enviro"
[1] "Writing out per sample PC loading values"
[1] "Saving a PCA plot as 'pc1_v_pc2.plot' into the enviro"
[1] "Saving a PCA plot as 'pc3_v_pc4.plot' into the enviro"
[1] "Plotting eigenvalues"
[1] "Plotting allele loadings"
null device 
          1 

#pc1_v_pc2.plot and pc3_v_pc4.plot are both created and saved

Two things I can think of - 1) you are somehow sourcing a previous version of the script that didn't include these new if statements (pre 2dd17b4). I've seen this before when multiple copies of a package are on a computer (eg. you have simple_pop_stats_pyes; simple_pop_stats_okis, etc., but there's a mismatch between location/source of the initiator script and PCA script you are secondarily sourcing?) 2) The plotting is to an unusual environmental layer, and not being retained - could potentially happen if things get nested.

I think I would suspect the first as most likely based on the description of the error?

bensutherland commented 3 weeks ago

Thank you @erondeau