DEIB-GECO / GMQL

GMQL - GenoMetric Query Language
http://www.bioinformatics.deib.polimi.it/geco/
Apache License 2.0
18 stars 11 forks source link

Problem with assignment of null and constant negative values #4

Closed pp86 closed 7 years ago

pp86 commented 7 years ago

In PROJECT, allow defining a new region attribute with a fixed value also negative (e.g. S2 = PROJECT(region_update: signal2 AS -999999) S1; which now does not compile

marcomass commented 7 years ago

Enable also the possibility to define a new numeric region attribute with "null" value.

This is important in order to enable the following composition, through the UNION(), with another dataset having such numeric region attribute, without loosing such region attribute in the UNION() result. Just as an example, see the following query: S1 = SELECT(cell == 'K562' AND antibody == 'c-Jun' AND ID == '1217') HG19_ENCODE_BED; S2 = SELECT(ID == '1011') HG19_ENCODE_NARROW; S1_B = PROJECT(region_update: signal AS null, pvalue AS null, qvalue AS null, peak AS null) S1; S3_B = UNION() S1_B S2; MATERIALIZE S3_B INTO S3_B;

marcomass commented 7 years ago

@pp86 Which is the syntax to use in PROJECT to define a new numeric region attribute with "null" value (see above application example)? The following one does not compile: S1_B = PROJECT(region_update: signal AS null, pvalue AS null, qvalue AS null, peak AS null) S1;

pp86 commented 7 years ago
pp86 commented 7 years ago

Syntax for creating a new attribute with NULL value is:

attribute_name AS NULL(TYPE)

where type may be one of the following:

NB: for the time being, both INTEGER and DOUBLE generate a double field, since we had a better support for that.

marcomass commented 7 years ago

@pp86 Unfortunately I need to reopen this issue due to the following two problems:

1) only attribute_name AS NULL(INTEGER) seems working, but the other mentioned values of types DOUBLE and STRING are not recognized by the compiler (see example query at the end of this comment). Please, add also the option DOUBLE, i.e., attribute_name AS NULL(DOUBLE). The case attribute_name AS NULL(STRING) in my opinion should not be allowed, since null for string type should be an empty string, i.e. "", which should be set through attribute_name AS "" (see issue 16).

2) As in issue 15, also here the generated xml schema included in the output dataset set as DOUBLE the new created region attribute with null value, although when created as type INTEGER (see attribute mynullD in example query below): S = SELECT(annotation_type == "promoter") HG19_BED_ANNOTATION; S0 = PROJECT(region_update: mynull AS NULL(INTEGER), mynullD AS NULL(DOUBLE), mystring AS "") S; MATERIALIZE S0 INTO S0;

marcomass commented 7 years ago

@pp86 Please disable from compiler the possibility for STRING type (i.e., attribute_name AS NULL(STRING)), since we defined as null value for string attributes the empty string "" (i.e., attribute_name AS "") . Shall I reopen the issue for this aspect?