DEIB-GECO / GMQL

GMQL - GenoMetric Query Language
http://www.bioinformatics.deib.polimi.it/geco/
Apache License 2.0
18 stars 11 forks source link

ORDER region_topg and meta_topg working as region_top and meta_top #45

Closed marcomass closed 7 years ago

marcomass commented 7 years ago

Differently from before, now ORDER region_topg and meta_topg do not perform correctly, as they work as region_top and meta_top

Expected behavior: From the documentation “The clause meta_topg (region_topg) implicitly considers the grouping by identical values of the first n − 1 ordering attributes and then selects the first k samples (meta_topg) or regions (region_topg) of each group.”

Using the following query: DATA = SELECT(cell == "Urothelia") HG19_ENCODE_NARROW; THETOP = ORDER(composite, treatment_type; meta_topg: 1) DATA; MATERIALIZE THETOP into res;

I expect that samples are grouped by 'composite', whose values are: wgEncodeAwgDnaseUniPk wgEncodeOpenChromDnase wgEncodeOpenChromFaire and then the first sample (by treatment_type) in each group is selected. Thus, I expect in output 3 samples with ID = 966/968 to be selected for composite=wgEncodeOpenChromDnase ID = 1006 to be selected for composite=wgEncodeOpenChromFaire ID = 39 to be selected for composite=wgEncodeAwgDnaseUniPk Yet, currently only a single sample is given in output! (as it would be by using meta_top).

akaitoua commented 7 years ago

Fixed. Sorry for being late (it took me time to find the error by debugging the code.) 718eb77

marcomass commented 7 years ago

@akaitoua I tested and now it works fine, thanks! Yet, the DESC / ASC modifiers seam not to be taken into account, generating the same result with or without specifying DESC / ASC. Please, check and fix this too.

marcomass commented 7 years ago

@akaitoua Actually, I'm ordering by two columns. Using the attached testing dataset, in both the following queries:

1) DATASET = SELECT() TOPG_TEST; RESULT = ORDER(biosample_term_name, region_count DESC; meta_topg: 1) DATASET; MATERIALIZE RESULT INTO RESULT;

2) DATASET = SELECT() TOPG_TEST; RESULT = ORDER(biosample_term_name, region_count ASC; meta_topg: 1) DATASET; MATERIALIZE RESULT INTO RESULT;

I obtain the same result:

But I would expect instead, in one of the two cases:

Could you please check? group_test.zip

akaitoua commented 7 years ago

fixed