Closed marcomass closed 7 years ago
TopP (Top Percent) is now supported on the DAG level. The documentation is added too.
Ok, good. Yet, it must be possible to specify it also at compiler level. Thus, I reopen and add Pietro to this issue. @pp86 Can you do it?
I added the option, it works as the Top and TopG, so it as to be passed by a named parameter:
@akaitoua Unfortunately I have to reopen this issue in order to fix the following 2 aspects:
meta_topp: N actually returns the N percentage samples of the input dataset, but not the top ones. It seems that ordering before selecting the top N% is missing.
region_topp: N actually returns the N percentage regions of each input sample, but not the top ones. It seems that ordering before selecting the top N% is missing.
As a test for meta_top, the following query should return 8 samples (as it does), but only those with the following biosample_term_name values: A549, H1-hESC, HCT116, HepG2 (in this order):
TEAD4_rep_broad_all = SELECT(project == "ENCODE" AND assembly == "hg19" AND assay == "ChIP-seq" AND output_type == "peaks" AND experiment_target == "TEAD4-human") HG19_ENCODE_BROAD_AUG_2017;
D1 = ORDER(biosample_term_name; meta_topp: 50) TEAD4_rep_broad_all; # Expected 8 samples MATERIALIZE D1 into D1;
Solved. Please let me know if any changes are required.
Add possibility to specify in ORDER meta_top and region_top if the required top k are the first k elements (based on the defined order), or the first k percentage elements out of the total number of elements, and accordingly to select the specified top k % elements in output.