Closed marcomass closed 7 years ago
Particularly, test that the "null" values, e.g. introduced by the UNION() operation when applied on two datasets with different schema, are correctly managed.
Verify that presence of null values (i.e. values "null" or empty - "") does not alter the correct calculation of aggregate functions (e.g. AVERAGE) on numeric region attributes including such null values. Provide here the GMQL query(s) used for such testing.
@OlgaGorlova, please merge your branch and label this as test and close it.
@OlgaGorlova Hi Olya, Is the average issue regarding null values fixed now? If yes, please confirm, if not please reopen this issue.
Hi @marcomass , Yes, it is fixed now.
Tested and successfully fixed.
Tested by Stefano P (Erlaad) with the following query: RAW = SELECT(clinical_follow_uptumor_status == 'with tumor' AND manually_curateddataType == 'dnamethylation27' AND clinical_follow_up__new_tumor_event_type == 'distant metastasis') HG19_TCGA_dnamethylation; TEST = COVER(1,ANY; aggregate: new_beta_value AS AVG(beta_value)) RAW;
MATERIALIZE RAW into raw; MATERIALIZE TEST into test;
Errata corrige: the above query doesn't work because of issue related with #61 . However, I have a new query which I tested and works: A = SELECT(assay == "ChIP-seq" AND biosample_term_name == "HepG2" AND experiment_target == "CEBPZ-human") HG19_ENCODE_NARROW_AUG_2017; A1 = PROJECT(region_update: new_field AS null(INTEGER)) A; A2 = PROJECT(region_update: new_field AS 2) A; B = UNION() A1 A2; B1 = COVER(1,ANY; aggregate: new_field_avg AS AVG(new_field)) B; MATERIALIZE B into raw; MATERIALIZE B1 into test;
Manage presence of null values in numeric region attributes.
o Region fields (both string and numeric) maybe NULL. NULL values are not considered for aggregate or tuple functions; Boolean predicates on NULL fields are always false. o Implementation: GMQL has the GNull data type; the implementation of nodes were the computation may have to deal with NULL values should be changed (e.g., adding a pre-filtering)