brentp / combined-pvalues

combining p-values using modified stouffer-liptak for spatially correlated results (probes)
MIT License
44 stars 21 forks source link

regions-t.bed left empty #19

Closed hhx037 closed 4 years ago

hhx037 commented 5 years ago

Hi,

I'm running the pipeline, and though there are five regions that pass the thresholds (pvalue and number of probes), the regions-t file is left empty, which also leads to an empty annotated file. Here is the screen output (I replaced the regions start and end by '...' for this post):

comb-p pipeline -c P.Value --seed 1e-3 --dist 200 -p test --region-filter-p 0.1 --annotate hg19 A_T6_vs_B_T6.bed

setting --acf-dist to 0.33 * --dist == 70 calculated stepsize as: 70 ACF:

lag_min lag_max correlation N p

1 71 0.05502 241121 5.617e-161 wrote: test.acf.txt

original lambda: 1.00

wrote: test.slk.bed.gz with lambda: 0.99 wrote: test.fdr.bed.gz wrote: test.regions.bed.gz (5 regions)

read 5 regions from test.regions.bed.gz

calculating ACF out to: 105

with 3 lags: [1, 71, 141]

Done with one-time ACF calculation

865859 bases used as coverage for sidak correction wrote: test.regions-p.bed.gz, (regions with corrected-p < 0.05: 5)

chrom start end min_p n_probes z_p z_sidak_p

chr6 ... ... 9.626e-08 9 4.421e-12 3.645e-08 chr6 ... ... 5.267e-12 4 2.17e-17 2.043e-13 chr6 ... ... 0.0001773 3 4.096e-09 6.011e-05 chr19 ... ... 2.156e-06 5 2.889e-11 2.579e-07 chr6 ... ... 9.289e-05 4 1.502e-09 2.205e-05 wrote: test.regions-t.bed, (regions with region-p < 0.100 and n-probes >= 0: 5) Bonferonni-corrected p-value for 865859 rows: 5.77e-08 values less than Bonferonni-corrected p-value: 0 saving to: test.manhattan.png wrote: test.anno.hg19.bed annotated with hg19

-rw-r--r-- 1 hhx037 qmul 84 Sep 27 16:51 test.acf.txt -rw-r--r-- 1 hhx037 qmul 1 Sep 27 16:58 test.anno.hg19.bed -rw-r--r-- 1 hhx037 qmul 145 Sep 27 16:51 test.args.txt -rw-r--r-- 1 hhx037 qmul 9.6M Sep 27 16:58 test.fdr.bed.gz -rw-r--r-- 1 hhx037 qmul 355K Sep 27 16:58 test.manhattan.png -rw-r--r-- 1 hhx037 qmul 139 Sep 27 16:58 test.regions.bed.gz -rw-r--r-- 1 hhx037 qmul 307 Sep 27 16:58 test.regions-p.bed.gz -rw-r--r-- 1 hhx037 qmul 0 Sep 27 16:58 test.regions-t.bed -rw-r--r-- 1 hhx037 qmul 9.1M Sep 27 16:57 test.slk.bed.gz

Any idea what's going on?

By the way, I'm running this in python 2.7, as I was getting an error in python 3 to do with it not able to import _common

Kind regards,

Stephane

brentp commented 5 years ago

what are the first few lines of A_T6_vs_B_T6.bed ?

hhx037 commented 5 years ago

chrom start end name P.Value chr1 10525 10526 cg14817997 0.0128365482685615 chr1 10848 10849 cg26928153 0.258364992831499 chr1 10850 10851 cg16269199 0.0596191980601117 chr1 15865 15866 cg13869341 0.352568310661498

brentp commented 5 years ago

so there's no t, or fold-change column so that step can' work. you can annotate the test.regions-p.bed.gz if that's what you're interested in.

hhx037 commented 5 years ago

Oh, I see! Sorry about that. Thanks for your help.

Steph

m-nabais commented 5 years ago

Hi,

I'm encountering the same problem: the regions-t.bed and the annotation files are empty, no red dots in the manhattan plot and the regions-p.bed file show all the regions, ignoring the filters (attached here) RA_combp_p1e-3_dist300_filter0.01.regions-p.bed.gz . Also running on Python 2.7.10.

This is my initial pipeline:

comb-p pipeline \ -c 4 \ --seed 1e-3 \ --dist 300 \ -p RA_combp_p1e-3_dist300_filter0.01 \ --region-filter-p 0.01 \ --region-filter-n 2 \ --annot hg38 \ RA_GSE42861_qced_normalized_DNAm_autosomes-no-X-react-no-SNP-sd0.02.linear_for_combp_sorted.bed

The in initial .bed file, looks like this: chrom start end p t chr1 898803 898804 0.6461426 -0.113236 chr1 898915 898916 0.207442 0.388866 chr1 898976 898977 0.9625391 -0.012214 chr1 902156 902157 0.05490661 1.30573 chr1 903106 903107 0.4281714 -0.563628 chr1 904055 904056 0.9896041 0.00405281

And the output looks like:

RA_GSE42861_qced_normalized_DNAm_autosomes-no-X-react-no-SNP-sd0.02.linear_for_combp_sorted.bed setting --acf-dist to 0.33 * --dist == 100 calculated stepsize as: 100 ACF:

lag_min lag_max correlation N p

1 101 0.197 51004 0 wrote: RA_combp_p1e-3_dist300_filter0.01.acf.txt

original lambda: 1.54

wrote: RA_combp_p1e-3_dist300_filter0.01.slk.bed.gz with lambda: 1.60 wrote: RA_combp_p1e-3_dist300_filter0.01.fdr.bed.gz wrote: RA_combp_p1e-3_dist300_filter0.01.regions.bed.gz (74 regions)

read 74 regions from RA_combp_p1e-3_dist300_filter0.01.regions.bed.gz

calculating ACF out to: 404

with 6 lags: [1, 101, 201, 301, 401, 501]

Done with one-time ACF calculation

234931 bases used as coverage for sidak correction wrote: RA_combp_p1e-3_dist300_filter0.01.regions-p.bed.gz, (regions with corrected-p < 0.05: 55)

chrom start end min_p n_probes z_p z_sidak_p t.pos/t.neg t.sum

chr6 31622863 31622898 0.0009924 2 3.103e-07 0.00208 2/0 4.58361 chr1 248739999 248740025 0.0003462 3 1.238e-07 0.001118 3/0 6.79411 chr20 37529175 37529274 0.0002944 3 9.527e-08 0.0002261 3/0 7.73842 chr6 32326693 32326801 1.918e-07 3 2.836e-11 6.168e-08 0/3 -3.435862 chr21 30600594 30600605 5.912e-09 2 7.549e-14 1.612e-09 0/2 -3.31257 chr20 45405607 45405619 0.0002486 2 7.409e-08 0.00145 2/0 7.04239 chr2 231528485 231528546 0.0002478 2 7.067e-08 0.0002721 2/0 3.68315 chr16 56989110 56989280 5.46e-09 2 1.955e-22 2.701e-19 0/2 -3.4313 chr10 18140714 18140775 6.118e-05 4 1.096e-08 4.221e-05 0/4 -11.48938 chr6 29680568 29680676 0.0006093 4 1.844e-06 0.004003 7/0 2.010867 chr6 167122568 167122697 1.016e-06 5 3.158e-10 5.752e-07 5/0 9.5357 chr3 18438750 18438806 5.67e-05 2 6.998e-09 2.936e-05 2/0 4.56867 chr14 80959568 80959672 4.608e-07 3 7.281e-11 1.645e-07 0/3 -3.85335 chr2 60536196 60536223 0.0009418 2 4.974e-07 0.004318 2/0 5.97652 chr21 38773437 38773481 0.0003248 2 1.104e-07 0.0005893 0/2 -5.23382 chr3 71584063 71584066 3.837e-05 2 3.593e-09 0.0002813 2/0 8.07873 chr3 49358760 49358800 0.000458 3 1.812e-07 0.001064 0/3 -8.61981 chr6 32815211 32815270 6.686e-09 2 1.424e-13 5.667e-10 2/0 5.20632 chr17 48621611 48621794 1.693e-05 3 2.832e-09 3.636e-06 0/3 -2.929638 chr11 134032213 134032304 0.0002478 2 7.002e-08 0.0001808 2/0 3.69179 chr17 14303457 14303585 0.0001743 5 1.086e-07 0.0001993 0/5 -6.95251 chr10 72089006 72089410 3.837e-05 9 1.34e-11 7.792e-09 9/0 19.751556 chr3 122563034 122563129 6.118e-05 3 1.232e-08 3.048e-05 0/3 -2.569347 chr7 137847045 137847056 0.0006911 2 3.236e-07 0.006888 0/2 -5.94503 chr16 54931175 54931231 0.0001706 4 3.993e-08 0.0001675 0/4 -12.56371 chr10 71712558 71712588 0.0009592 2 5.267e-07 0.004116 0/2 -4.62258 chr1 24920363 24920432 4.976e-05 2 5.395e-09 1.837e-05 0/2 -4.96877 chr8 23706412 23706513 0.0005175 4 2.268e-07 0.0005275 0/4 -11.0711 chr2 101699980 101700035 4.976e-05 2 5.719e-09 2.443e-05 2/0 4.72031 wrote: RA_combp_p1e-3_dist300_filter0.01.regions-t.bed, (regions with region-p < 0.010 and n-probes >= 2: 29) Bonferonni-corrected p-value for 234932 rows: 2.13e-07 values less than Bonferonni-corrected p-value: 34 saving to: RA_combp_p1e-3_dist300_filter0.01.manhattan.png wrote: RA_combp_p1e-3_dist300_filter0.01.anno.hg38.bed annotated with hg38

Not quite sure what I can be doing wrong?...

Thanks in advance.

Cheers

fchuffar commented 4 years ago

Dears,

Perhaps it's a maive question but why regions-t.bed needs t or fold-change column ?

brentp commented 4 years ago

it doesn't, unless you want to use pipeline to full effect.

fchuffar commented 4 years ago

I'm sorry but I don't understand your answer.

Actually, I'm using pipeline as you said, and I try to reproduce my own result obtained last year https://www.biorxiv.org/content/10.1101/852186v1 Unfortunately, the same script produces an empty regions-t.bed file.

Could you explain me what has changed ? And why now pipeline needs a t or fold-change column ?

brentp commented 4 years ago

Hi, I have just pushed a fix for this. In updating to support python3, I introduced a bug. This is not fixed in master and tagged as version 0.50.0. thanks for persisting.