DEIB-GECO / GMQL

GMQL - GenoMetric Query Language
http://www.bioinformatics.deib.polimi.it/geco/
Apache License 2.0
18 stars 11 forks source link

JOIN( UPSTREAM) wrong output regions and score #92

Closed eirinistam closed 6 years ago

eirinistam commented 6 years ago

With the following query: D1 = SELECT(region: chr == chr4) example_dataset_1; D2 = SELECT(region: chr == chr4) example_dataset_2; RES = JOIN(MD(1), UP; output: RIGHT) D1 D2; MATERIALIZE RES INTO join_11; Received wrong regions ouput and scores join_11.zip picture1

marcomass commented 6 years ago

Related to issue #86 Please Eirini, can you upload also the screen shot?

pp86 commented 6 years ago

@marcomass @eirinistam

0 as score is correct, since the score field of the result is set to 0.0 by the GTF converter. chr4 GMQL Region 180 240 0.0 - . D1.source "GMQL"; D1.feature "Region"; D1.score "0.9"; D1.frame "."; D1.name "."; D1.signal "15"; D1.pvalue "15.1452"; D1.qvalue "-1"; D1.peak "-1"; D2.source "GMQL"; D2.feature "Region"; D2.score "240"; D2.frame "."; D2.name "."; D2.signal "17"; D2.pvalue "11.1675"; D2.qvalue "-1"; D2.peak "-1";

Is the problem with the first green region?Which should not be in the output?

marcomass commented 6 years ago

@pp86 The predicate must be applied only upstream of the anchor region. Only one copy of the green region on the right should be in the output (it is adjacent to the second blue region). Here the definition: ● in the positive strand (or when the strand is unknown), UP is true for those regions of the experiment whose right-end is lower than, or equal to, the left-end of the anchor, and DOWN is true for those regions of the experiment whose left-end is higher than, or equal to, the right-end of the anchor; ● in the negative strand inequalities are exchanged; ● remaining regions of the experiment must be overlapping with the anchor region.