DEIB-GECO / GMQL

GMQL - GenoMetric Query Language
http://www.bioinformatics.deib.polimi.it/geco/
Apache License 2.0
18 stars 11 forks source link

NOT IN (in semi-join of select) [and ALLBUT (in project) also in API] #32

Closed marcomass closed 7 years ago

marcomass commented 7 years ago

Currently there are some feature of the GMQL language that are available at compiler level, but not available for API usage. They are ALLBUT in project NOT IN in semi-join of select

Make them available also for API usage.

akaitoua commented 7 years ago

@marcomass, I implemented the ALLBUT in region Project API. But i did not understand the BUT for the semi-join. Would you provide an example.

API example for Project is:

DS1.PROJECT(projected_meta = Some(List("filename")),
            extended_meta = None, 
            all_but = List("score"),
            extended_values = None)
marcomass commented 7 years ago

The example is S0 = SELECT() dataset; S2 = SELECT( semijoin cell NOT IN S0) S1; From what I understand from Simone, this is possible only using the language, but not the API.

akaitoua commented 7 years ago

@marcomass, I checked the code. I have no idea how to implement the semijoin negation without making problems for the compiler.

marcomass commented 7 years ago

@akaitoua
@pp86 Pietro, can you check and give suggestion about how to implement for the API the option NOT IN in the semijoin of the SELECT?

pp86 commented 7 years ago

I checked the code of the compiler and the NOT IN for the semijoin IS NOT implemented. I do not think we even have the data structure at core level to support such feature

akaitoua commented 7 years ago

"Not in" for the semijoin is implemented on the DAG level. The current code works (compiler will work) with the new modifications because i just added a negation flag with a default value of false to the negation. case class MetaJoinCondition(attributes : List[AttributeEvaluationStrategy], negation:Boolean = false)

The sue case will be in the documentation.

val outputDS = DS1.SELECT(
      semi_con = MetaJoinCondition(
                          attributes = List(Default("att")),
                          negation = true),
      meta_join_variable = DS2
    )
marcomass commented 7 years ago

@akaitoua @pp86 Thank you Abdulrahman. Does this require Pietro to work on the compiler (to recognize the NOT IN syntax), or you just did it? (in the current version on the web (updated at yesterday) NOT IN is not recognized by the compiler)

marcomass commented 7 years ago

Unfortunately I need to reopen this issue since the NOT IN clause is not yet recognized at compiler level. Use the following query to test it.

TEAD4_rep_broad = SELECT(project == "ENCODE" AND assembly == "hg19" AND assay == "ChIP-seq" AND output_type == "peaks" AND experiment_target == "TEAD4-human" AND biosample_term_name == "Ishikawa") HG19_ENCODE_BROAD_AUG_2017; MATERIALIZE TEAD4_rep_broad into TEAD4_rep_broad; HM_TF_rep_broad = SELECT(project == "ENCODE" AND assembly == "hg19" AND assay == "ChIP-seq" AND output_type == "peaks" AND experiment_target == "TEAD4-human"; semijoin: biosample_term_name NOT IN TEAD4_rep_broad) HG19_ENCODE_BROAD_MAY_2017; MATERIALIZE HM_TF_rep_broad into HM_TF_rep_broad;