al2na / methylKit

R package for DNA methylation analysis
https://bioconductor.org/packages/release/bioc/html/methylKit.html
203 stars 96 forks source link

Error reading in file #302

Open Ge0rges opened 11 months ago

Ge0rges commented 11 months ago

Hello,

I have written an R script that uses methylKit to ultimately get DMRs. In this script, I read in multiple .bed files into a single methRead() call. The script completes successfully on at least 3 different sets of data I have input.

I am encountering an odd issue however with one dataset where the following error occurs:

Received list of locations.
Reading file.
Reading file.
Erreur in `.rowNamesDF<-`(x, value = value) :
  length of 'row.names' incorrect
Calls : methRead ... row.names<- -> row.names<-.data.frame -> .rowNamesDF<-
Execution stopped

As you can see, this happens when reading in file 2. However, these files are all generated in an identical fashion by a master script. Here is what their head looks like:

==> methylation/barcode01/1821_sub.bed <==                                                                                                        
contig_6061     3       4       m       1       +       3       4       255,0,0 1       0.00    0       1       0       0       0       0       2 
contig_6061     13      14      m       2       +       13      14      255,0,0 2       0.00    0       2       0       0       0       1       0 
contig_6061     44      45      m       3       +       44      45      255,0,0 3       0.00    0       3       0       0       0       0       0 
contig_6061     45      46      m       1       -       45      46      255,0,0 1       0.00    0       1       0       0       0       0       0 
contig_6061     52      53      m       1       -       52      53      255,0,0 1       0.00    0       1       0       0       0       0       0 
contig_6061     73      74      m       1       -       73      74      255,0,0 1       0.00    0       1       0       0       0       0       0 
contig_6061     139     140     m       1       +       139     140     255,0,0 1       0.00    0       1       0       0       0       0       3 
contig_6061     164     165     m       1       +       164     165     255,0,0 1       0.00    0       1       0       0       0       0       3 
contig_6061     189     190     m       1       +       189     190     255,0,0 1       0.00    0       1       0       0       0       0       4 
contig_6061     190     191     m       1       -       190     191     255,0,0 1       0.00    0       1       0       0       0       0       0 

==> methylation/barcode02/1821_sub.bed <==                                                                                                        
contig_6061     3066    3067    m       1       -       3066    3067    255,0,0 1       0.00    0       1       0       0       0       0       0 
contig_6061     3149    3150    m       1       -       3149    3150    255,0,0 1       0.00    0       1       0       0       0       0       0 
contig_6061     3297    3298    m       1       -       3297    3298    255,0,0 1       0.00    0       1       0       0       0       0       0 
contig_6061     3327    3328    m       1       -       3327    3328    255,0,0 1       0.00    0       1       0       0       0       0       0 
contig_6061     3350    3351    m       1       -       3350    3351    255,0,0 1       0.00    0       1       0       0       0       0       0 
contig_6061     3383    3384    m       1       -       3383    3384    255,0,0 1       0.00    0       1       0       0       0       0       0 
contig_6061     3416    3417    m       1       -       3416    3417    255,0,0 1       0.00    0       1       0       0       0       0       0 
contig_6061     3455    3456    m       1       -       3455    3456    255,0,0 1       0.00    0       1       0       0       0       0       0 
contig_6061     3495    3496    m       1       -       3495    3496    255,0,0 1       0.00    0       1       0       0       0       0       0 
contig_6061     3517    3518    m       1       -       3517    3518    255,0,0 1       0.00    0       1       0       0       0       0       0 

==> methylation/barcode03/1821_sub.bed <==                                                                                                        
contig_6061     3       4       m       1       +       3       4       255,0,0 1       0.00    0       1       0       0       0       0       0 
contig_6061     4       5       m       1       -       4       5       255,0,0 1       0.00    0       1       0       0       0       2       0 
contig_6061     14      15      m       1       -       14      15      255,0,0 1       0.00    0       1       0       0       0       0       2 
contig_6061     33      34      m       1       +       33      34      255,0,0 1       0.00    0       1       0       0       0       1       0 
contig_6061     44      45      m       2       +       44      45      255,0,0 2       0.00    0       2       0       0       0       0       0 
contig_6061     45      46      m       3       -       45      46      255,0,0 3       0.00    0       3       0       0       0       0       0 
contig_6061     79      80      m       1       -       79      80      255,0,0 1       0.00    0       1       0       0       0       2       0 
contig_6061     86      87      m       1       -       86      87      255,0,0 1       0.00    0       1       0       0       0       2       0 
contig_6061     119     120     m       1       -       119     120     255,0,0 1       0.00    0       1       0       1       0       1       0 
contig_6061     127     128     m       1       -       127     128     255,0,0 1       0.00    0       1       0       0       0       0       2 

Relevant script:

pipeline = list(fraction = FALSE, chr.col = 1, start.col = 2, end.col = 3, coverage.col = 10, strand.col = 6, freqC.col = 11)                           

# Get list of bedMethyl files to analyze                                                                                                                
files.list = list()                                                                                                                                     
samples = list()                                                                                                                                        
treatment = vector()                                                                                                                                    
t = 0                                                                                                                                                   

# For each barcode                                                                                                                                      
for (i in 1:7) {                                                                                                                                        
    # Skip the control barcode                                                                                                                          
    if (i == 4) next                                                                                                                                    

    # Add that barcode to the sample list                                                                                                               
    sample = paste0("barcode0", i)                                                                                                                      
    samples <- c(samples, sample)                                                                                                                       

    # Get file path to the bedmethyl file                                                                                                               
    files.list <- c(files.list, paste0("/researchdrive/gkanaan/seaice_methylation/methylation/", sample, "/", assembly, ".bed"))                        

    # Add the correct treatment (top, middle, bottom)                                                                                                   
    treatment = c(treatment, t%%3)                                                                                                                      
    t = t + 1                                                                                                                                           
}                                                                                                                                                       

meth_obj = methRead(files.list, sample.id = samples, assembly=assembly, treatment=treatment, context="CHH", pipeline=pipeline, header=FALSE, mincov=5)

I appreciate any tips.

Ge0rges commented 11 months ago

Ok it seems that this is due to the fact that 0 rows are left after coverage filtering. MethylKit unfortunately does not handle this very gracefully. @alexg9010 is there a way to skip files with 0 rows after coverage filtering instead of crashing?

Browse[5]> n                                                                          
debug: data[, coverage.col] = round(data[, coverage.col])                             
Browse[5]> n                                                                          
debug: data = data[data[, coverage.col] >= mincov, ]                                  
Browse[5]> n                                                                          
debug: strand = rep("+", nrow(data))                                                  
Browse[5]> print(data)                                                                
 [1] V1  V2  V3  V4  V5  V6  V7  V8  V9  V10 V11 V12 V13 V14 V15 V16 V17 V18          
<0 lignes> (ou 'row.names' de longueur nulle)                                         
alexg9010 commented 11 months ago

Hi @Ge0rges ,

You are right, this edge case is not handled very well. For now, you could use tryCatch to filter files with zero rows. using this code snippet:

meth_list <- lapply(seq_along(file.list),
                    function(i) {
                      tryCatch(
                        expr = {
                          methRead(
                            location = file.list[[i]],
                            sample.id = as.list(samples)[[i]],
                            assembly=assembly, 
                            # treatment=treatment, 
                            context="CHH", 
                            pipeline=pipeline, 
                            header=FALSE, 
                            mincov=5)
                          )
                        },
                        error = function(e) {
                          message(paste0("file ", file.list[[i]], " failed to load."))
                          return(NULL)
                        }
                      )

                    })

failed_samples <- sapply(meth_list, is.null)
meth_obj <- methylRawList(meth_list[!failed_samples], treatment = treatment[!failed_samples])

Best Alex

Ge0rges commented 11 months ago

Thanks for that solution.