DEIB-GECO / GMQL

GMQL - GenoMetric Query Language
http://www.bioinformatics.deib.polimi.it/geco/
Apache License 2.0
18 stars 11 forks source link

GROUP #57

Closed marcomass closed 7 years ago

marcomass commented 7 years ago

Completely define and implement the GROUP operator for cross-samples aggregations.

akaitoua commented 7 years ago

Group operator is implemented on API level.

sunbrn commented 7 years ago

The query: GROUPS = GROUP(cell,cell_sex; meta_aggregates: MScore AS avg(score)) DATA_SET_VAR;

    • calculates the SUM of scores instead of AVG
    • outputs new metadata _group with very strange numbers (e.g., -2369154366923451895)
    • the parameter region_aggregates does not compile (message: None.get)
    • what is the default? According to documentation the meta_group_name (I used the name from scala code) should be mandatory, but instead GROUP() without arguments does compile
    • From the doc "Samples having missing values for any of the grouping metadata attributes are discarded.". This is not working, those samples are still in the output.
    • From the doc "When multiple regions in the same sample have the same coordinates; these regions are collapsed into a single one.". It seems to me this is not working.
pp86 commented 7 years ago

@akaitoua @OlgaGorlova should be involved here.

  1. There was a small mistake in the implementation, which I corrected
  2. The strange number is because of the hash function which computes it from the ids in the group
  3. This was a silly error I made, I fixed it (thanks for catching it)
  4. The default has not been defined yet, I did not put the check on GROUP(); it just does nothing
  5. This is on @akaitoua @OlgaGorlova
  6. This is on @akaitoua @OlgaGorlova
sunbrn commented 7 years ago

Currently, I cannot specify region_aggregates parameter without specifying at least one region_key. I should be able to (this corresponds to calculating aggregates based on duplicate regions). @pp86

sunbrn commented 7 years ago

Also in region_aggregates it calculates the SUM of scores instead of AVG (now in the metadata_aggregates it works correctly) @pp86

sunbrn commented 7 years ago

From the documentation: "In the output dataset, storing the results of aggregate function evaluations over metadata and/or region attributes in each group of samples and/or regions, respectively." It seems to me that aggregates inside the _metaaggregates option are currently on region attributes (e.g., if I try meta_aggregates: ids AS BAG(metadata_attribute), I get the COMPILE_FAILED message "Avalilable fields are { name, score ...}". Is this the intended behavior? @pp86

OlgaGorlova commented 7 years ago

@pp86 @sunbrn I have fixed meta and region aggregations (e.g., SUM, AVG etc.)

sunbrn commented 7 years ago

This bug should be closed since I opened a more updated one: #75 @pp86 @OlgaGorlova @pp86