LiUSemWeb / HeFQUIN

HeFQUIN is a query federation engine for heterogeneous federations of graph data sources, including federations of knowledge graphs.
https://liusemweb.github.io/HeFQUIN/
Apache License 2.0
19 stars 3 forks source link

Re implement logical plan printer #349

Closed huanyu-li closed 3 months ago

hartig commented 3 months ago

I will wait with taking a closer look until after you have resolved the compilation error that makes the CI build of the PR fail.

Notice, however, that the idea should not be to add printing-related functionality directly into the operators (as you seem to have done with the newly added toStringArray function). Instead, all printing-related functionality should be implemented in the TextBasedLogicalPlanPrinterImpl2 class. The reason why I am insisting of taking this functionality out of the operators themselves is that this way of printing may be only one of several possible ways that we are going to implement, and we should not overload the classes of the operators with all the code for all these ways of printing.

huanyu-li commented 3 months ago

I will wait with taking a closer look until after you have resolved the compilation error that makes the CI build of the PR fail.

Notice, however, that the idea should not be to add printing-related functionality directly into the operators (as you seem to have done with the newly added toStringArray function). Instead, all printing-related functionality should be implemented in the TextBasedLogicalPlanPrinterImpl2 class.

Hi Olaf, as you mentioned yesterday, I should not anymore use toString function for implementing. I am bit confused with how functions in TextBasedLogicalPlanPrinterImpl2 get operators' information. Or do you mean using toString in a different way, for instance, for req operator, split a string based on some separators or move what toString function does to TextBasedLogicalPlanPrinterImpl2? Best, Huanyu

hartig commented 3 months ago

Within the code in TextBasedLogicalPlanPrinterImpl2, you explicit check what class the root operator of the current (sub)plan is; e.g.,

final LogicalOperator rootOp = plan.getRootOperator();
if ( rootOp instanceof LogicalOpTPAdd ) {
   ...
}
else if ( rootOp instanceof LogicalOpBGPAdd ) {
   ...
}
else if ( rootOp instanceof ...

Then, for each such class you add a separate method that takes care of the printing of operators of that class, e.g.,

protected void printOperatorInfo( final LogicalOpTPAdd op, final PrintStream out ) {
    final FederationMember fm = op.getFederationMember();

    out.append( "req" + " (" + getID() + ")" );
    out.append( System.lineSeparator() );
// TODO: print the correct indentation string here
    out.append( "- fm (" + op. fm.getInterface().toString() + ")" );
  ...
}

When you call these methods from the if-branches mentioned above, you need to cast the rootOp variable to the correct class; e.g.,

if ( rootOp instanceof LogicalOpTPAdd ) {
   printOperatorInfo( (LogicalOpTPAdd) rootOp, out );
}
   ...
huanyu-li commented 3 months ago

Within the code in TextBasedLogicalPlanPrinterImpl2, you explicit check what class the root operator of the current (sub)plan is; e.g.,

final LogicalOperator rootOp = plan.getRootOperator();
if ( rootOp instanceof LogicalOpTPAdd ) {
   ...
}
else if ( rootOp instanceof LogicalOpBGPAdd ) {
   ...
}
else if ( rootOp instanceof ...

Then, for each such class you add a separate method that takes care of the printing of operators of that class, e.g.,

protected void printOperatorInfo( final LogicalOpTPAdd op, final PrintStream out ) {
    final FederationMember fm = op.getFederationMember();

    out.append( "req" + " (" + getID() + ")" );
    out.append( System.lineSeparator() );
// TODO: print the correct indentation string here
    out.append( "- fm (" + op. fm.getInterface().toString() + ")" );
  ...
}

When you call these methods from the if-branches mentioned above, you need to cast the rootOp variable to the correct class; e.g.,

if ( rootOp instanceof LogicalOpTPAdd ) {
   printOperatorInfo( (LogicalOpTPAdd) rootOp, out );
}
   ...

I see, I have updates for mu/mj/req now in TextBasedLogicalPlanPrinterImpl2.

huanyu-li commented 3 months ago

Just take bgpAdd as an example,

return "> bgpAdd" +
        "[" + codeOfBGP + ", "+ codeOfFm + "]"+
        " ( "
        + bgp.toString()
        + ", "
        + fm.getInterface().toString()
        + " )";

Should we print like the following way?

├bgpAdd
  - codeOfBGP, codeOfFm [-1530180337, -34331270] 
  - fm (SPARQL endpoint at https://query.wikidata.org/sparql )
  - pattern (bgp ?cc  <http://www.w3.org/2000/01/rdf-schema#label>  ?o . )

I think [codeOfBGP, codeOfFm], [codeOfPattern, codeOfFm], [codeOfTP, codeofFM] are also shared by some operators.

hartig commented 3 months ago

Should we print like the following way?

├bgpAdd
  - codeOfBGP, codeOfFm [-1530180337, -34331270] 
  - fm (SPARQL endpoint at https://query.wikidata.org/sparql )
  - pattern (bgp ?cc  <http://www.w3.org/2000/01/rdf-schema#label>  ?o . )

Hence, overall it should look as follows for bgpAdd.

|
└── bgpAdd
    |  - fm (-1530180337) SPARQL endpoint at https://query.wikidata.org/sparql
    |  - pattern (-34331270) bgp ?cc  <http://www.w3.org/2000/01/rdf-schema#label>  ?o .
    |
    └── ...
huanyu-li commented 3 months ago

Do we have example queries somewhere that cover all operators?

hartig commented 3 months ago

Do we have example queries somewhere that cover all operators?

No, unfortunately not. Why?

huanyu-li commented 3 months ago

Do we have example queries somewhere that cover all operators?

No, unfortunately not. Why?

I was just thinking test on a query needs unary operator like bgpAdd to see the current implementation works.

hartig commented 3 months ago

I was just thinking test on a query needs unary operator like bgpAdd to see the current implementation works.

Are you talking just about some adhoc test that you wanted to do (simply running the CLI with a query) or unit tests?

I assume the former. In this case, it is a bit tricky if the default setting of the logical optimizer is not rewriting the given source assignment (i.e., initial logical plan) into a logical plan that uses tpAdd, which I am not sure at the moment whether it does.

As an alternative it is always possible to quickly write a small Java program like the hello.java that we have, where this program simply creates exactly the logical plan that you want and then uses the plan printer to print out this plan.

huanyu-li commented 3 months ago

I found an error in TextBasedLogicalPlanPrinterImpl2 when testing filter operator. I have new updates now. Should I restore the branch in this PR or just create a new PR?

hartig commented 3 months ago

Please create a new PR.

hartig commented 3 months ago

I just found another two problems with the current implementation of TextBasedLogicalPlanPrinterImpl2: I simply tried to print the logical plan for the example query in the (ExampleQuery.rq). That is, after doing a mvn package, I run the following command:

java -cp target/HeFQUIN-0.0.3-SNAPSHOT.jar se.liu.ida.hefquin.cli.RunQueryWithoutSrcSel --federationDescription=ExampleFederation.ttl --query=ExampleQuery.rq --skipExecution --printLogicalPlan

The output is:

--------- Logical Plan ---------
mj (2) 
├──req (0)
     - fm (1126119123) SPARQL endpoint at http://dbpedia.org/sparql
     - pattern (1143051543) (bgp ?c  <http://www.w3.org/2002/07/owl#sameAs>  ?cc .<http://dbpedia.org/resource/Berlin>
          <http://dbpedia.org/ontology/country>  ?c . )

└──req (1)
     - fm (-70922805) SPARQL endpoint at https://query.wikidata.org/sparql
     - pattern (984506574) (bgp ?cc  <http://www.w3.org/2000/01/rdf-schema#label>  ?o . )

--------------
| c | cc | o |
==============
--------------

As you can see, the | line from mj to the second req is missing. Additionally, there is no space before each of the two req.

In other words, I would expect the printout to look as follows (but it doesn't).

--------- Logical Plan ---------
mj (2) 
├── req (0)
|    - fm (1126119123) SPARQL endpoint at http://dbpedia.org/sparql
|    - pattern (1143051543) (bgp ?c  <http://www.w3.org/2002/07/owl#sameAs>  ?cc .<http://dbpedia.org/resource/Berlin>
          <http://dbpedia.org/ontology/country>  ?c . )
|
└── req (1)
     - fm (-70922805) SPARQL endpoint at https://query.wikidata.org/sparql
     - pattern (984506574) (bgp ?cc  <http://www.w3.org/2000/01/rdf-schema#label>  ?o . )

--------------
| c | cc | o |
==============
--------------
huanyu-li commented 3 months ago

The missing |s are included in my new code. I just created a new PR. For the extra space for req, I just added. Do you think we need that for all operators, or just some unary operators in addition?

hartig commented 3 months ago

For the extra space for req, I just added. Do you think we need that for all operators, or just some unary operators in addition?

All operators should have a space between the ├── and their name.