DEIB-GECO / GMQL

GMQL - GenoMetric Query Language
http://www.bioinformatics.deib.polimi.it/geco/
Apache License 2.0
18 stars 11 forks source link

Sample scalability #13

Closed marcomass closed 7 years ago

marcomass commented 7 years ago

Make the system really scalable, also when many samples are processed, possible automatically defining query parameter setting to obtain top efficiency regardless the ammount of data the query involves

Use follwing query to test, since now it gives a runtime execution crash / does not end:

EXPRESSED_GENE = SELECT(dataType == 'rnaseq' AND tumor_tag == 'hnsc') HG19_TCGA_RnaSeq_Gene_V2; METHYLATION = SELECT(dataType == 'dnamethylation' AND tumor_tag == 'hnsc') HG19_TCGA_Dnamethylation_V2; MUTATION = SELECT(data_type == 'dnaseq' AND tumor_tag == 'hnsc') HG19_TCGA_DnaSeq_V2;

GENE_METHYL_0 = MAP(joinby: bcr_sample_barcode) EXPRESSED_GENE METHYLATION; GENE_METHYL = SELECT(region: count_EXPRESSED_GENE_METHYLATION > 0) GENE_METHYL_0;

GENE_METHYL1 = COVER(1, ANY) GENE_METHYL;

MUTATION_GENE = JOIN(DISTANCE < 2000, DISTANCE > 0; output: left) MUTATION GENE_METHYL1; MUTATION_GENE_count = EXTEND(mutation_count AS COUNT()) MUTATION_GENE; MUTATION_GENE_top = ORDER(mutation_count DESC; META_TOP: 3) MUTATION_GENE_count;

MATERIALIZE MUTATION_GENE_top INTO MUTATION_GENE_top;

andreagulino commented 7 years ago

The query is now running.

I have monitored the consumption of resources throughout the execution and everything run without problems, i.e.:

Since the last time the query was run: