Closed albmarch closed 7 years ago
I add three new examples with the same problem. I find 1 region instead of 3, 4, or 5 regions.
The query is: Count_3 = SELECT() test_cover_count_3; Count_4 = SELECT() test_cover_count_4; Count_5 = SELECT() test_cover_count_5_1;
Count_3_c = COVER(1,ANY; groupby: target) Count_3; Count_4_c = COVER(1,ANY; groupby: target) Count_4; Count_5_c = COVER(1,ANY; groupby: target) Count_5;
H_3 = HISTOGRAM(1,ANY) Count_3_c; H_4 = HISTOGRAM(1,ANY) Count_4_c; H_5 = HISTOGRAM(1,ANY) Count_5_c;
C_3 = COVER(1,ANY) Count_3_c; C_4 = COVER(1,ANY) Count_4_c; C_5 = COVER(1,ANY) Count_5_c;
MATERIALIZE H_3 INTO H_3; MATERIALIZE H_4 INTO H_4; MATERIALIZE H_5 INTO H_5;
MATERIALIZE C_3 INTO C_3; MATERIALIZE C_4 INTO C_4; MATERIALIZE C_5 INTO C_5;
Input datasets: test_cover_count_3.zip, test_cover_count_4.zip test_cover_count_5_1.zip
Histogram outputs: job_test_cover_guest_new1161_20171028_090109_H_3.zip, job_test_cover_guest_new1161_20171028_090109_H_4.zip, job_test_cover_guest_new1161_20171028_090109_H_5.zip
Cover outputs: job_test_cover_guest_new1161_20171028_090109_C_3.zip, job_test_cover_guest_new1161_20171028_090109_C_4.zip, job_test_cover_guest_new1161_20171028_090109_C_5.zip
The problem is partially fixed. You will have a marginal error of "one base" for every Region that starts or stops on the bin border (1K border).
@akaitoua Thank you for looking into this. What do you mean that there will still be the error of "one base"? Do you expect the obtained region to have 1 base more? Or one base less? (when region starts or stops on a bin border)
@marcomass, Because of a technical issue in the binning algorithm, i had to fix this issue with the slight error mentioned above. Which means that the missing region (17798000 17798017 3) will be shown as (17798001 17798017 3).
Hi, I obtained a wrong results with this query:
data = SELECT() data; H = HISTOGRAM(1, ANY) data; C = COVER(1, ANY) data; MATERIALIZE H INTO H; MATERIALIZE C INTO C;
Observing the input data, I expect to obtain two regions from the COVER, while in the results there is only a region.
Furthermore, in the output of Histogram there is a coordinate (17800000) that is not present in the input data. On the opposite, an input region stop coordinate (17798000) is not present in the result. Can the problem be caused by the "exchange" of value of the input coordinate?
The correct result of Histogram should be this: 17797617 17797619 1 17797619 17797673 2 17797673 17797704 3 17797704 17797772 4 17797772 17797927 5 17797927 17798000 4 17798000 17798017 3 17798017 17798123 2 17798123 17798216 1 17799230 17799404 1 17799404 17799534 2 17799534 17800080 1
The input dataset is : data.zip
The output of the query are: job_test_cover_alberto_marchesi_20171013_163613_H.zip and job_test_cover_alberto_marchesi_20171013_163613_C.zip