YuLab-SMU / ChIPseeker

:dart: ChIP peak Annotation, Comparison and Visualization
https://onlinelibrary.wiley.com/share/author/GYJGUBYCTRMYJFN2JFZZ?target=10.1002/cpz1.585
224 stars 74 forks source link

support more kinds of bed files #173

Closed MingLi-929 closed 2 years ago

MingLi-929 commented 2 years ago

This pr is to fix a bug, according to #170. Here is the comfirmation. Download the narrowPeak file,broadPeak and gappedPeak file from MACS. And read these files by readPeakFile() and direactly way.

library(ChIPseeker)

narrow <- readPeakFile("run_callpeak_narrow0_peaks.narrowPeak")
narrow_test <- read.delim("run_callpeak_narrow0_peaks.narrowPeak",
                          comment.char = "#",header = F)

broad <- readPeakFile("run_callpeak_broad_peaks.broadPeak")
broad_test <- read.delim("run_callpeak_broad_peaks.broadPeak",
                         comment.char = "#",header = F)

gapped <- readPeakFile("run_callpeak_broad_peaks.gappedPeak")
gapped_test <- read.delim("run_callpeak_broad_peaks.gappedPeak",
                          comment.char = "#",header = F)

Here is the comparison of narrowPeak file. 730 items in Granges obj and matrix.

> narrow
GRanges object with 730 ranges and 7 metadata columns:
        seqnames            ranges strand |                     V4
           <Rle>         <IRanges>  <Rle> |            <character>
    [1]    chr22 17255467-17255847      * | run_callpeak_narrow0..
    [2]    chr22 17372665-17373263      * | run_callpeak_narrow0..
    [3]    chr22 17392236-17392676      * | run_callpeak_narrow0..
    [4]    chr22 17398274-17398699      * | run_callpeak_narrow0..
    [5]    chr22 17538969-17539492      * | run_callpeak_narrow0..
    ...      ...               ...    ... .                    ...
  [726]    chr22 51060642-51061034      * | run_callpeak_narrow0..
  [727]    chr22 51129862-51130373      * | run_callpeak_narrow0..
  [728]    chr22 51170497-51171010      * | run_callpeak_narrow0..
  [729]    chr22 51195415-51195798      * | run_callpeak_narrow0..
  [730]    chr22 51213629-51213946      * | run_callpeak_narrow0..
               V5          V6        V7        V8        V9       V10
        <integer> <character> <numeric> <numeric> <numeric> <integer>
    [1]       214           .   13.2158   24.2291   21.4687       166
    [2]       408           .   21.3866   44.0149   40.8560       283
    [3]       866           .   38.6603   90.8119   86.6351       217
    [4]       508           .   22.0952   54.2169   50.8226       203
    [5]       428           .   19.5260   46.0327   42.8229       282
    ...       ...         ...       ...       ...       ...       ...
  [726]       157           .   6.10754   18.3857   15.7468       184
  [727]       346           .  13.83010   37.6338   34.6136       255
  [728]       192           .  11.30450   21.9726   19.2590       226
  [729]       103           .   7.70763   12.8618   10.3401       162
  [730]       214           .  13.27340   24.1997   21.4401       141
  -------
  seqinfo: 1 sequence from an unspecified genome; no seqlengths

  ===========================================================================

  > dim(narrow_test)
[1] 730  10

Here is the comparison in broadPeak file. 745 items in Granges obj but 746 in matrix. Here comes the bug.

> broad
GRanges object with 745 ranges and 6 metadata columns:
        seqnames            ranges strand | run_callpeak_broad_peak_1
           <Rle>         <IRanges>  <Rle> |               <character>
    [1]    chr22 17372665-17373263      * |    run_callpeak_broad_p..
    [2]    chr22 17392235-17392676      * |    run_callpeak_broad_p..
    [3]    chr22 17398271-17398728      * |    run_callpeak_broad_p..
    [4]    chr22 17538929-17539492      * |    run_callpeak_broad_p..
    [5]    chr22 17555766-17556245      * |    run_callpeak_broad_p..
    ...      ...               ...    ... .                       ...
  [741]    chr22 51060337-51061034      * |    run_callpeak_broad_p..
  [742]    chr22 51129862-51130373      * |    run_callpeak_broad_p..
  [743]    chr22 51170497-51171010      * |    run_callpeak_broad_p..
  [744]    chr22 51195404-51195812      * |    run_callpeak_broad_p..
  [745]    chr22 51213626-51213946      * |    run_callpeak_broad_p..
             X110           .  X8.20787  X13.5941  X11.0752
        <integer> <character> <numeric> <numeric> <numeric>
    [1]       175           .   11.0340   20.2366   17.5880
    [2]       412           .   20.7017   44.4668   41.2894
    [3]       246           .   12.9593   27.4385   24.6290
    [4]       200           .   11.5619   22.7066   20.0028
    [5]       321           .   15.8014   35.1451   32.1652
    ...       ...         ...       ...       ...       ...
  [741]        75           .   4.84068   9.96309   7.55174
  [742]       197           .  10.23950  22.40530  19.70920
  [743]       116           .   8.21861  14.12390  11.60230
  [744]        47           .   4.86105   7.13422   4.78776
  [745]       139           .   9.69485  16.53970  13.95900
  -------
  seqinfo: 1 sequence from an unspecified genome; no seqlengths

  ============================================================================

  > dim(broad_test)
[1] 746   9

Here is the comparison in gappedPeak file. The same as the situation in broadPeak file. 745 items in Granges but 746 in matrix.

> gapped
GRanges object with 745 ranges and 12 metadata columns:
        seqnames            ranges strand | run_callpeak_broad_peak_1
           <Rle>         <IRanges>  <Rle> |               <character>
    [1]    chr22 17372665-17373263      * |    run_callpeak_broad_p..
    [2]    chr22 17392235-17392676      * |    run_callpeak_broad_p..
    [3]    chr22 17398271-17398728      * |    run_callpeak_broad_p..
    [4]    chr22 17538929-17539492      * |    run_callpeak_broad_p..
    [5]    chr22 17555766-17556245      * |    run_callpeak_broad_p..
    ...      ...               ...    ... .                       ...
  [741]    chr22 51060337-51061034      * |    run_callpeak_broad_p..
  [742]    chr22 51129862-51130373      * |    run_callpeak_broad_p..
  [743]    chr22 51170497-51171010      * |    run_callpeak_broad_p..
  [744]    chr22 51195404-51195812      * |    run_callpeak_broad_p..
  [745]    chr22 51213626-51213946      * |    run_callpeak_broad_p..
             X110           . X17255466.1 X17255847.1        X0
        <integer> <character>   <integer>   <integer> <integer>
    [1]       175           .    17372664    17373263         0
    [2]       412           .    17392234    17392676         0
    [3]       246           .    17398270    17398728         0
    [4]       200           .    17538928    17539492         0
    [5]       321           .    17555765    17556245         0
    ...       ...         ...         ...         ...       ...
  [741]        75           .    51060336    51061034         0
  [742]       197           .    51129861    51130373         0
  [743]       116           .    51170496    51171010         0
  [744]        47           .    51195403    51195812         0
  [745]       139           .    51213625    51213946         0
               X1        X381        X0.1  X8.20787  X13.5941  X11.0752
        <integer> <character> <character> <numeric> <numeric> <numeric>
    [1]         1         599           0   11.0340   20.2366   17.5880
    [2]         2       1,441         0,1   20.7017   44.4668   41.2894
    [3]         3     1,426,1     0,3,457   12.9593   27.4385   24.6290
    [4]         2       1,524        0,40   11.5619   22.7066   20.0028
    [5]         2       1,476         0,4   15.8014   35.1451   32.1652
    ...       ...         ...         ...       ...       ...       ...
  [741]         2       1,393       0,305   4.84068   9.96309   7.55174
  [742]         1         512           0  10.23950  22.40530  19.70920
  [743]         1         514           0   8.21861  14.12390  11.60230
  [744]         3     1,384,1    0,11,408   4.86105   7.13422   4.78776
  [745]         2       1,318         0,3   9.69485  16.53970  13.95900
  -------
  seqinfo: 1 sequence from an unspecified genome; no seqlengths

  ====================================================================================

  > dim(gapped_test)
[1] 746  15