Wedge-lab / dpclust

Dirichlet Process based methods for subclonal reconstruction of tumours
GNU Affero General Public License v3.0
28 stars 16 forks source link

.Rdata input file #14

Open davidwedge opened 2 years ago

davidwedge commented 2 years ago

RunDP currently looks for a file dataset.RData in the outdir, and reads in data from this file before running DirichlelProcessClustering. This leads to problems if running DPClust with an output directory that has been previously used, as input data is overwritten with data from a previous run. A possible fix is to add the samplename / seed / date to the Rdata filename.

Avramis commented 1 month ago

I created a branch named "Adjust_Rdata_in_DirichletProcessClustering" to address the issue. Instead of using "dataset.RData" as the file name, I introduced a variable called rdata_file_name. The rdata_file_name variable is constructed as follows: rdata_filename = paste(paste0("Seed-", seed), paste0("Date-", chartr(" ", "", Sys.time())), "dataset.RData", sep = "_") This adds the seed and system time to the file name. The rdata_file_name variable is then used as the file name in all instances, replacing the hardcoded "dataset.RData" string.

P.S. If including the exact time in the file name is too specific, we can use the following command to include only the date: rdata_filename = paste(paste0("Seed-", seed), paste0("Date-", Sys.Date()), "dataset.RData", sep = "")

Avramis commented 1 month ago

@MiaoGaoUK, could you please review the changes in the "Adjust_Rdata_in_DirichletProcessClustering" branch and merge them into the main "DPClust" branch?