Closed marcomass closed 7 years ago
@marcomass, I implemented the ALLBUT in region Project API. But i did not understand the BUT for the semi-join. Would you provide an example.
API example for Project is:
DS1.PROJECT(projected_meta = Some(List("filename")),
extended_meta = None,
all_but = List("score"),
extended_values = None)
The example is S0 = SELECT() dataset; S2 = SELECT( semijoin cell NOT IN S0) S1; From what I understand from Simone, this is possible only using the language, but not the API.
@marcomass, I checked the code. I have no idea how to implement the semijoin negation without making problems for the compiler.
@akaitoua
@pp86 Pietro, can you check and give suggestion about how to implement for the API the option NOT IN in the semijoin of the SELECT?
I checked the code of the compiler and the NOT IN for the semijoin IS NOT implemented. I do not think we even have the data structure at core level to support such feature
"Not in" for the semijoin is implemented on the DAG level. The current code works (compiler will work) with the new modifications because i just added a negation flag with a default value of false to the negation. case class MetaJoinCondition(attributes : List[AttributeEvaluationStrategy], negation:Boolean = false)
The sue case will be in the documentation.
val outputDS = DS1.SELECT(
semi_con = MetaJoinCondition(
attributes = List(Default("att")),
negation = true),
meta_join_variable = DS2
)
@akaitoua @pp86 Thank you Abdulrahman. Does this require Pietro to work on the compiler (to recognize the NOT IN syntax), or you just did it? (in the current version on the web (updated at yesterday) NOT IN is not recognized by the compiler)
Unfortunately I need to reopen this issue since the NOT IN clause is not yet recognized at compiler level. Use the following query to test it.
TEAD4_rep_broad = SELECT(project == "ENCODE" AND assembly == "hg19" AND assay == "ChIP-seq" AND output_type == "peaks" AND experiment_target == "TEAD4-human" AND biosample_term_name == "Ishikawa") HG19_ENCODE_BROAD_AUG_2017; MATERIALIZE TEAD4_rep_broad into TEAD4_rep_broad; HM_TF_rep_broad = SELECT(project == "ENCODE" AND assembly == "hg19" AND assay == "ChIP-seq" AND output_type == "peaks" AND experiment_target == "TEAD4-human"; semijoin: biosample_term_name NOT IN TEAD4_rep_broad) HG19_ENCODE_BROAD_MAY_2017; MATERIALIZE HM_TF_rep_broad into HM_TF_rep_broad;
Currently there are some feature of the GMQL language that are available at compiler level, but not available for API usage. They are ALLBUT in project NOT IN in semi-join of select
Make them available also for API usage.