DoaneAS / rseqc

Automatically exported from code.google.com/p/rseqc
0 stars 0 forks source link

Format in bed file.... for geneBody_coverage.py #8

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
HI,
I installed the program ok, and get the bam stat script to work ok, but when:
1. using the geneBody_coverage.py script, I have a bed file like this:
PGSC0003DMB000000001    73068   73525   exon    .   +   0   0   0   0   0   0
PGSC0003DMB000000001    73068   75744   gene    .   .   0   0   0   0   0   0
PGSC0003DMB000000001    73068   75744   mRNA    .   +   0   0   0   0   0   0
PGSC0003DMB000000001    73305   73525   CDS .   +   0   0   0   0   0   0
PGSC0003DMB000000001    73525   73673   intron  .   +   0   0   0   0   0   0
PGSC0003DMB000000001    73673   73777   exon    .   +   0   0   0   0   0   0
PGSC0003DMB000000001    73673   73777   CDS .   +   0   0   0   0   0   0
PGSC0003DMB000000001    73777   73854   intron  .   +   0   0   0   0   0   0
PGSC0003DMB000000001    73854   73973   exon    .   +   0   0   0   0   0   0
PGSC0003DMB000000001    73854   73973   CDS .   +   0   0   0   0   0   0

2. I only want to graph geneBody coverage... (from a bam file) 

I get error, graph looks like this (attached).

How should the bed file look?

I have 11.10 ubuntu and have the version:
 geneBody_coverage.py --version
geneBody_coverage.py 2.3

Any ideas on what the bed file needs correcting???? Is column 3 ok in the bed 
file??
Thanks a lot!
cheers,
maximo

Original issue reported on code.google.com by rivabr...@gmail.com on 19 Jul 2012 at 9:16

Attachments:

GoogleCodeExporter commented 9 years ago
This is the error:

$ geneBody_coverage.py -r PGSC_DM_v3.4_mRNAonly_sort.bed12 -i 
lib1_novoalignCS_sort.bam -o lib1_gene_cov2
Load BAM file ...  Done
calculating coverage over gene body ...
null device 
          1 

Original comment by rivabr...@gmail.com on 19 Jul 2012 at 9:49

GoogleCodeExporter commented 9 years ago
Sorry, one more thing, I also tried this bed file :
PGSC0003DMB000000001    73068   75744   mRNA    0   +   0   0   0   0   0   0
PGSC0003DMB000000001    75358   75744   mRNA    0   -   0   0   0   0   0   0
PGSC0003DMB000000001    216860  217247  mRNA    0   -   0   0   0   0   0   0
PGSC0003DMB000000001    279843  280527  mRNA    0   +   0   0   0   0   0   0
PGSC0003DMB000000001    280128  280658  mRNA    0   +   0   0   0   0   0   0
PGSC0003DMB000000001    287750  288163  mRNA    0   -   0   0   0   0   0   0
PGSC0003DMB000000001    294225  295562  mRNA    0   +   0   0   0   0   0   0
PGSC0003DMB000000001    300843  305520  mRNA    0   +   0   0   0   0   0   0
PGSC0003DMB000000001    309678  310038  mRNA    0   +   0   0   0   0   0   0
PGSC0003DMB000000001    313400  316135  mRNA    0   -   0   0   0   0   0   0

Original comment by rivabr...@gmail.com on 19 Jul 2012 at 9:51

GoogleCodeExporter commented 9 years ago
Hi Maximo,

We were having a similar problem and we would get the same graph as our output. 
 Finally after quite a bit of work, we figured out our problem.  In our case, 
our alignment software was saving the chromosome information in the bam header 
by one name but it was a different name in our reference file.  We used 
samtools to see the header information in our bam file and noticed that it 
mismatched from the reference bed file.  The way we fixed it was to open the 
bed file in a text editor and find/replace so the chromosome names matched 
between the bed file and the bam header.  This solved the problem and we have 
beautiful data.  I hope this helps.  

If you still haven't solved the problem, email me and maybe we can figure 
something out together.  I tried a lot of things before finding a solution (and 
now of course it seems obvious), and maybe we can compare notes.

Eric  

Original comment by eric.j.e...@gmail.com on 6 Sep 2012 at 1:07

GoogleCodeExporter commented 9 years ago
Thanks. I did manage to correct the bed files by hand as well. My problem 
seemed to be in the last columns of my bed file.
using the merged transcripts output (gtf) file from cufflinks and turning into 
a bed file worked.
I could not get the gff3 to bed file conversion right? not sure why.
I should be able to use "." for the last 3 columns of the bed file, right?
or did i nedd strict sorting on 1st then 2nd column in the bed file?
below some info:
cheers and thanks!

Did NOT work:
$ head PGSC_DM_v3.4_mRNAonly_sort2.bed12
PGSC0003DMB000000010    1242813 1245444 PGSC0003DMT400000001    0   +   1242813 1245444 0   
0   0   0
PGSC0003DMB000000010    1242813 1245444 PGSC0003DMT400000002    0   +   1242813 1245444 0   
0   0   0
PGSC0003DMB000000010    1243970 1245448 PGSC0003DMT400000003    0   +   1243970 1245448 0   
0   0   0

Yes worked:
$ head cufflinks_transcripts_all.bed12
PGSC0003DMB000000001    73068   75744   PGSC0003DMT400084231    0   +   73068   75744   0   4   457,10
4,119,238   0,605,786,2438
PGSC0003DMB000000001    75358   75744   PGSC0003DMT400084232    0   -   75358   75744   0   1   386 0

Original comment by rivabr...@gmail.com on 9 Sep 2012 at 1:59

DawnEve commented 3 years ago

perhaps you should try bed like this:

$ head /home/wangjl/data/ref/mouse/UCSC/mm10_refseq_whole_gene_noChr.bed
1       134199214       134235457       NM_001039510    0       -       134202950       134234355       0       3       4376,398,230,   0,34800,36013,
1       134199214       134235457       NM_001282945    0       -       134202950       134234355       0       3       4376,432,230,   0,34800,36013,
1       134199214       134235457       NM_001008533    0       -       134202950       134234355       0       2       4376,1443,      0,34800,
1       134199214       134235457       NM_001291930    0       -       134202950       134203505       0       2       4376,230,       0,36013,
1       134199214       134234856       NM_001291928    0       -       134202950       134234733       0       2       4376,194,       0,35448,
1       58713285        58733227        NR_149255       0       +       58733227        58733227        0       4       374,427,136,975,        0,13020,17866,18967,
1       25067475        25829707        NM_175642       0       -       25068167        25826760        0       31      881,105,36,644,40,96,172,99,86,67,70,151,103,104,127,44,103,76,150,109,69,195,107,102,165,165,165,162,111,772,246,      0,7209,16768,26289,31400,33957,44216,44701,49813,59165,61343,63775,154284,159244,160945,328960,353082,363947,364951,389516,393267,420449,436957,437437,464394,464999,479940,486199,492279,758528,761986,
1       58711490        58758882        NM_001293804    0       +       58732272        58753922        0       10      102,89,427,136,74,55,50,82,508,5099,    0,2080,14815,19661,20762,24167,28772,29474,40848,42293,
1       58711490        58758882        NM_001289704    0       +       58726436        58753922        0       10      102,427,106,136,74,55,50,82,508,5099,   0,14815,17565,19661,20762,24167,28772,29474,40848,42293,
1       8359738 9299877 NM_027671       0       -       8363474 8803943 0       21      3895,111,93,153,72,117,39,130,131,83,103,42,44,58,57,135,54,54,64,82,1745,      0,54464,85288,88241,103655,105698,194986,223466,235673,247331,265040,318068,318198,319334,322234,423024,444178,522422,639447,843440,938394,