SoundAg / sounDMR

Differentially methylated region analysis from Oxford Nanopore Technologies data
Apache License 2.0
7 stars 1 forks source link

Error while creating Megaframe such as Methyl_bed #69

Closed sachingadakh closed 1 week ago

sachingadakh commented 2 weeks ago

Hello I am running Dorado-based bedmethyl output for two different samples, which look like the following: Normal : chr1 10468 10469 m 1 + 10468 10469 255 1 100 m chr1 10470 10471 m 2 + 10470 10471 255 2 50 m chr1 10483 10484 m 2 + 10483 10484 255 2 100 m chr1 10488 10489 m 3 + 10488 10489 255 3 100 m chr1 10492 10493 m 3 + 10492 10493 255 3 100 m chr1 10496 10497 m 2 + 10496 10497 255 2 100 m chr1 10524 10525 m 2 + 10524 10525 255 2 100 m chr1 10541 10542 m 2 + 10541 10542 255 2 100 m chr1 10562 10563 m 3 + 10562 10563 255 3 100 m chr1 10570 10571 m 2 + 10570 10571 255 2 100 m

Treated : chr1 10635 10636 h 1 + 10635 10636 255 1 0 h chr1 10635 10636 m 1 + 10635 10636 255 1 100 m chr1 10637 10638 h 1 + 10637 10638 255 1 0 h chr1 10637 10638 m 1 + 10637 10638 255 1 100 m chr1 10661 10662 h 1 + 10661 10662 255 1 100 h chr1 10661 10662 m 1 + 10661 10662 255 1 0 m chr1 10664 10665 h 1 + 10664 10665 255 1 0 h chr1 10664 10665 m 1 + 10664 10665 255 1 100 m chr1 10666 10667 h 1 + 10666 10667 255 1 0 h chr1 10666 10667 m 1 + 10666 10667 255 1 100 m

while using split_by_chromosome , I am facing errors like issue number #68 but when I manually provide data for only one chromosome, I get the error as follows : Creating the Megaframe |--------------------------------------------------| |==================================================| Error in Methyl_bed[[paste("PerMeth", SampleID, sep = "")]] * Methyl_bed$Tot_reads : non-numeric argument to binary operator

My input commands were : bedlist <- list.files(path = "chr_chr1/", pattern = "*.bed") bedlist [1] "normal_chr1.bed" "treated_chr1.bed" Methylframe <- generate_methylframe(methyl_bed_list=bedlist, Sample_count = 0, Methyl_call_type="Dorado", filter_NAs = 0, gene_info = FALSE, Gene_column=NA, target_info=FALSE, File_prefix="Sample")

I checked the error "non-numeric argument to binary operator," and the class of all the columns is appropriate as per the data above: : $V1: "character" $V2 : "integer" $V3 : "integer" $V4 : "character" $V5 : "integer" $V6 : "character" $V7 : "integer" $V8 : "integer" $V9 : "integer" $V10 : "integer" $V11 : "numeric" $V12 : "character"

Any help/suggestion will be appreciated. Thank you

jcolicchio-soundag commented 2 weeks ago

Hey, sorry about this!

The recent changes to Dorado have changed the output in a way that has caused some issues to our code. I belive the issue here is that your methylbed files inclue rows with 0 total reads. In particular columns 10 and 11 likely have some NAs. This is yielding some problems when attempting to manipulate them. If you upload a subset of maybe the first 5000 rows of your methylbeds i can look into it.

sachingadakh commented 1 week ago

Hello Sure, Thank you for your getting back to me. I have attached the files here. sachin_data.tar.gz Besides, I didn't come across NAs, but there are some 0s in the 10th and 11th columns. However, the 11th column will have some 0, given it's a methylation frequency column.

sachingadakh commented 1 week ago

Hello, I removed the duplicated rows that were tagged with the "h" modification tag in the treated sample and also replaced the "." with the "+" strand in both samples. The rest worked perfectly; now, I am analyzing the results. Thank you