Issue using split_by_chromosome

nickyph commented 1 year ago

Hi there!

I was wondering if you could help me solve the following issue. For now I am trying out the example data to use this script. While running the following code I get an error:

bedlist <- c() for (i in 1:length(methyl_bed)) { beds <- split_by_chromosome(methyl_bed[i]) bedlist[(length(bedlist) + 1)] <- list(beds) }

Error is: Error in split_by_chromosome(methyl_bed[i]) : could not find function "split_by_chromosome"

When checking the available packages in library(sounDMR) by using lsf.str("package:sounDMR"), I receive the following:

add_changepoint_info : function (whole_df, changepoint, col_name) add_zoom_coords : function (target, gene_cord_df, geneco_index, gcoord_exist = TRUE, Gene_col = "Gene_name") boot_score : function (sound_score_obj = NA, target_gene = NA, target_start = -1000, target_end = 0, nboots = 1000, scoring_col_name = "dmr_score", direction_DMR = "positive") changepoint_analysis : function (whole_df, CG_penalty = int, CHG_penalty = int, CHH_penalty = int, target_genes = c(), save_plots = FALSE, z_col = "column") create_cols_for_individuals : function (Exp_ID_Treated, Output_Frame, GenePercentPlant, GeneDepthPlant, GenePercentGroup, control = "C") create_dmr_obj : function (ZoomFrame = dataframe, experimental_design_df = dataframe) create_fixed_effects : function (fixed = c("effect1", "effect2")) create_formula : function (fixed, random) create_gene_percent_x : function (LongPercent, x = "Chromosome", function_name = mean) create_methyl_summary : function (dmr_obj, control = "C", treated = "T", colnames_of_interest = c("Chromosome", "Gene", "Position", "Strand", "CX", "Zeroth_pos", "Plant"), additional_summary_cols = list()) create_random_effects : function (random = c("Group", "ID")) find_changepoint_col_options : function (DMR_output, Output_Frame = Output_Frame) find_cpt_mean : function (data, z_col, penalty) find_DMR : function (Output_Frame, dmr_obj, fixed = c("Group"), random = c("Plant"), reads_threshold = 3, model, control = "", analysis_type) generate_megaframe : function (methyl_bed_list = All_methyl_beds, Sample_count = 0, Methyl_call_type = "Dorado", File_prefix = "", max_read_depth = 100) generate_methylframe : function (methyl_bed_list = All_methyl_beds, Sample_count = 0, Methyl_call_type = "Dorado", filter_NAs = 0, max_read_depth = 100, gene_info = FALSE, gene_coordinate_file = NULL, Gene_column = "", target_info = FALSE, gene_list = gene_coordinate_file[[Gene_column]], File_prefix = "Sample") generate_zoomframe : function (gene_cord_df, MFrame, Gene_col, target_info = TRUE, gene_list = gene_cord_df[[Gene_col]], File_prefix = "") get_standard_methyl_bed : function (Methyl_bed = "Methyl.bed", Sample_ID = "S1", Methyl_call_type = "Dorado", max_read_depth = 100) group_DMR : function (Output_Frame, ZoomFrame_filtered, experimental_design_df, fixed = c("Group"), random = c("Plant"), reads_threshold = 3, model = "binomial", colnames_of_interest = c("Chromosome", "Gene", "Position", "Strand", "CX", "Zeroth_pos", "Individual")) individual_DMR : function (Output_Frame, ZoomFrame_filtered, experimental_design_df, fixed = c("Group"), random = c("Individual"), reads_threshold = 3, control = "C", model = "beta-binomial", colnames_of_interest = c("Chromosome", "Gene", "Position", "Strand", "CX", "Zeroth_pos", "Individual")) pivot_and_subset : function (data, starts_with_cols = "start", values_to, colnames_of_interest = c("Chromosome", "Gene", "Position", "Strand", "CX", "Zeroth_pos", "Individual")) plot_changepoints : function (data, changepoint_obj, gene_name, penalty_val, cyt_context, z_col) run_binomial : function (LM, i = int, formula, optimizer_func = "optimizer") run_model : function (data, i, Output_Frame, formula, model_type, individual_name_z = "") save_model_summary : function (i, Output_Frame, model_summary, ind_name = "") sound_score : function (changepoint_OF = dataframe, Statistic = "Z_GroupT_small", Per_Change = "Treat_V_Control", Control = "Control", other_columns = c("Estimate_GroupT_small"), CF = FALSE, UserFunction = NA) subset_cols : function (df, colnames_of_interest = c("Chromosome", "Gene", "Position", "Strand", "CX", "Zeroth_pos", "Individual")) subset_methyl_summary : function (methyl_summary, individuals_to_keep)

In which function split_by_chromosome seems to be missing. Could you help me out?

edit to add that according to `sessionInfo()` I am using version `sounDMR_2.1.1`

kprabhu-soundag commented 1 year ago

Hi @nickyph , Thanks for reporting the bug. If you are using the sample data set, you wouldn't need the split_by_chromosome() function since this is mainly used for whole genome BedMethyl file to make the data more manageable. For now, since the sample data set is from Adaptive sequencing, You can skip this and if you do need to use it for other reasons, you can load the function manually in your .GlobalEnv by copying it from the script within sounDMR/R/SounDMR.R.

That said, we are currently working on fixing this and should be able to get it exported to the package in our next release.

nickyph commented 1 year ago

Hi @kprabhu-soundag, thank you for your swift response, I appreciate it a lot! I'm looking forward to the next release as I will hopefully be able to use it for my own whole genome .bed files.

In my quest of trying to solve this issue on my own I unfortunately am not able to successfully run any code besides methyl_bed <- list.files(path=".",pattern="*.bed") and Geneco <- read.table(file.choose(), header=TRUE, sep=",").

After this, the command Methylframe <- generate_methylframe(methyl_bed_list=methyl_bed, Sample_count = 0, Methyl_call_type="Dorado", filter_NAs = 0, gene_info = FALSE, gene_coordinate_file = NA, Gene_column=NA, target_info=FALSE, File_prefix="Sample")

Which is the command without gene info, gives me the following error Creating the Megaframe Error in[.data.frame(Methyl_bed, , c(1, 2, 3, 4, 5, 6)) : undefined columns selected In addition: Warning message: In read.table(file = file, header = header, sep = sep, quote = quote, : not all columns named in 'colClasses' exist

The command including the gene info shows me the same error message.

The file I chose for Geneco is the Sample_gene_coordinates.csv which I think should be the correct one. Could it be that my methyl_bed is working improperly as that command results in a 'value' in the R environment, stating methyl_bed chr [1:5] "Ath_Adaptive_sampling_regions.bed" "WaterControl_2_methyl.bed" "WaterControl_8_methyl.bed" "Zeb_100Treated_2_methyl.bed" "Zeb_100Treated_8_methyl.bed" ?

Again, thank you for your fast reply and aplogies if I am missing something very obvious. SounDMR is very promising for my data as this is the only code I have come across making use of Dorado data, which is something we have done as well.

kprabhu-soundag commented 1 year ago

Hi @nickyph , no worries. I'm glad you think that our tool is very promising.

Based on your .GlobalEnv variables, the problem lies in methyl_bed <- list.files(path=".",pattern="*.bed") The pattern depends on what your methyl bed file is stored as. In this case it would just be methyl.bed. Using .bed is causing the "Ath_Adaptive_sampling_regions.bed" to also be in your R env. This is just the regions used for Adaptive sequencing and is not a methylation calling output. Hope this helps!

kprabhu-soundag commented 1 year ago

@nickyph , was the issue resolved?

kprabhu-soundag commented 1 year ago

Issue closed since the above comments address the way to resolve the error mentioned in the issue.

SoundAg / sounDMR