DEIB-GECO / GMQL

GMQL - GenoMetric Query Language
http://www.bioinformatics.deib.polimi.it/geco/
Apache License 2.0
18 stars 11 forks source link

ORDER - add top k % in both meta_top and region_top #44

Closed marcomass closed 7 years ago

marcomass commented 7 years ago

Add possibility to specify in ORDER meta_top and region_top if the required top k are the first k elements (based on the defined order), or the first k percentage elements out of the total number of elements, and accordingly to select the specified top k % elements in output.

akaitoua commented 7 years ago

TopP (Top Percent) is now supported on the DAG level. The documentation is added too.

marcomass commented 7 years ago

Ok, good. Yet, it must be possible to specify it also at compiler level. Thus, I reopen and add Pietro to this issue. @pp86 Can you do it?

pp86 commented 7 years ago

I added the option, it works as the Top and TopG, so it as to be passed by a named parameter:

marcomass commented 7 years ago

@akaitoua Unfortunately I have to reopen this issue in order to fix the following 2 aspects:

As a test for meta_top, the following query should return 8 samples (as it does), but only those with the following biosample_term_name values: A549, H1-hESC, HCT116, HepG2 (in this order):

TEAD4_rep_broad_all = SELECT(project == "ENCODE" AND assembly == "hg19" AND assay == "ChIP-seq" AND output_type == "peaks" AND experiment_target == "TEAD4-human") HG19_ENCODE_BROAD_AUG_2017;

MATERIALIZE TEAD4_rep_broad_all into TEAD4_rep_broad_all; # 16 samples

D1 = ORDER(biosample_term_name; meta_topp: 50) TEAD4_rep_broad_all; # Expected 8 samples MATERIALIZE D1 into D1;

akaitoua commented 7 years ago

Solved. Please let me know if any changes are required.