boostorg / spirit

Boost.org spirit module
http://boost.org/libs/spirit
392 stars 161 forks source link

Any need for rule_definition? #647

Closed cppljevans closed 3 years ago

cppljevans commented 3 years ago

The PR:

https://github.com/boostorg/spirit/pull/646

will reduce the use of rule_definition (because it's no longer used in the BOOST_SPIRIT_DEFINE). The remaining uses, when used with larger recursive grammars, will suffer the compile time explosion reported here:

https://sourceforge.net/p/spirit/mailman/message/37194823/

The compile time explosion is caused by rule_definition's storing, when BOOST_SPIRIT_DEFINE is not used, the rule_id->RHS mapping in the context.

https://github.com/boostorg/spirit/blob/master/include/boost/spirit/home/x3/nonterminal/detail/rule.hpp#L211

Since BOOST_SPIRIT_DEFINE avoids this compile time explosion, there's no reason, AFAICT, to keep rule_definition around. Why not remove it?

However, this removal will require a slight modification of the default parse_rule in nonterminal/rule.hpp; however, I think that modification would be fairly easy. Actually, the following prototype:

//Purpose:
//  Provide a simplified analog to:
//    [rule_separate_tu]
/*
  https://github.com/boostorg/spirit/blob/master/test/x3/rule_separate_tu_grammar.hpp
  https://github.com/boostorg/spirit/blob/master/test/x3/rule_separate_tu_grammar.cpp
 */
//  which focuses solely on demonstrating how recursive Rule's are implemented.
//References:
//  [ADL] argument dependent lookup [https://en.cppreference.com/w/cpp/language/adl]
//=======================
#include <boost/preprocessor/stringize.hpp>
constexpr unsigned max_calls=10;
constexpr unsigned num_parsers=3;
#include <iostream>
namespace x3
  {
    template
    < typename Rule
    >
    constexpr bool
  defined_rule
    /**@brief
     *  Only used to avoid always firing static_assert in default parse_rule below
     */
    (
    )
    { return false;
    }
    template
    < typename Rule
    >
    bool
  parse_rule //default parse_rule
    ( Rule
    , unsigned calls
    )
    {
        std::cout<<"parse_rule(***Rule UNDEFINED***),calls="<<calls<<".\n";
        static_assert(defined_rule<Rule>(), "BOOST_SPIRIT_DEFINE undefined for this rule.");
        return false;
    }
    template
    < typename ID 
      //to enable [ADL] to work, must be defined in namespace 
      //where BOOST_SPIRIT_DEFINE is called.
    >
    struct
  rule
    {
        bool 
      parse
        ( unsigned calls=0
        )const
        {   
          std::cout<<"rule::parse(defined):calls="<<calls<<".\n";
          if(calls>max_calls) return true;
          return parse_rule(*this, calls+1);
        }
    };
  }//x3 namespace
#include <boost/preprocessor/cat.hpp>
#define BOOST_SPIRIT_DEFINE(rule_name)                          \
  bool                                                          \
parse_rule                                                      \
  ( decltype(rule_name)                                         \
  , unsigned calls                                              \
  )                                                             \
  {                                                             \
    std::cout                                                   \
      <<"parse_rule("                                           \
      <<BOOST_PP_STRINGIZE(rule_name)                           \
      <<", calls="<<calls<<").\n";                              \
    return BOOST_PP_CAT(rule_name, _def).parse(calls); \
  }                                                             \
/***/
namespace unused_attr
  {
  using namespace x3;
  //{{local struct id's 
  //  Purpose:
  //    These declarations, and their use as the rule<ID>'s
  //    in the following rule declarations, forces [ADL]
  //    to search this namespace for matching parse_rule.  These matching
  //    parse_rule's are generated by the calls to BOOST_SPIRIT_DEFINE below.
  struct id0;
  struct id1;
  struct id2;
  //}}local struct id's 
  //{{analog in [rule_separate_tu]
  //  are the declarations such as:
  //    using skipper_type = x3::rule<class skipper_r>;
  //    const skipper_type skipper;
  //  within:
  //    rule_separate_tu_grammar.hpp
  const rule<id0> rule0;
  const rule<id1> rule1;
  const rule<id2> rule2;
  //}}analog in [rule_separate_tu]
  //{{analog in [rule_separate_tu]
  //  are the declarations such as:
  //    const auto skipper_def = x3::lit('*');
  //  within:
  //    rule_separate_tu_grammar.cpp
  const auto rule0_def=rule1;
  const auto rule1_def=rule2;
  const auto rule2_def=rule0;
  //}}analog in [rule_separate_tu] 
  //{{analog in [rule_separate_tu] 
  //  is the invocations of BOOST_SPIRIT_DEFINE
  //  within:
  //    rule_separate_tu_grammar.cpp
  BOOST_SPIRIT_DEFINE(rule0)
  BOOST_SPIRIT_DEFINE(rule1)
//#define DEFINE_RULE2
#ifdef DEFINE_RULE2
  BOOST_SPIRIT_DEFINE(rule2)
#endif  
  //}}analog in [rule_separate_tu] 
  }
  int 
main()
  { unused_attr::rule0.parse(0);
    return 0;
  }

shows, "prototypically", how to do that. That prototype certainly doesn't have all of the details, such as Attribute and Iterator template args, but that's OK because it's purpose is only to show the "essense" of how spirit's BOOST_SPIRIT_DEFINE implements recursive parsers.

OTOH, if the prototype is correct, then this prototype would help, IMHO, greatly, future maintainers understand how BOOST_SPIRIT_DEFINE and the like implement mutually recursive parsers. I found it very frustrating that the prototype didn't work when, instead of the 'unused_attr::id{0-2}', a simple 'unsigned ID' was used as the rule ID. Only after vaguely remembering something about how ADL works did I mirror what spirit actually does and added the 'struct unused_attr::id{0-2}', afterwhich, it "miraculously" worked. I suspect future maintainers, unless they happen to be devoted c++ language lawyers, would appreciate this prototype to show how spirit implements mutually recursive parsers using ADL. That's the purpose of the code comments referencing [ADL]. The macro, DEFINE_RULE2, can be undefined to see what happens when a rule is not BOOST_SPIRIT_DEFINE'ed. Quite frankly, I don't understand why the defined_rule() is needed within the default parse_rule. I just added it because, without it, the static_assertion always fired. I only mirrorwed what spirit did to avoid the always firing. Maybe someone could provide some analysis of why this defined_rule() is needed?

-regards, Larry

cppljevans commented 3 years ago

The prototype code shown below is coded as close as possible to the above prototype to enable easy comparison of 2 methods of finding the definition of a rule. In short, the above uses ADL whereas the below uses specializations of a template class which is made the template argument to the now templated parse function. The code below seems easier to understand to me, but YMMV.

//Purpose:
//  Provide a simplified analog to:
//    [rule_separate_tu]
/*
  https://github.com/boostorg/spirit/blob/master/test/x3/rule_separate_tu_grammar.hpp
  https://github.com/boostorg/spirit/blob/master/test/x3/rule_separate_tu_grammar.cpp
 */
//  which focuses solely on demonstrating how recursive Rule's are implemented.
//References:
//  [ADL] argument dependent lookup [https://en.cppreference.com/w/cpp/language/adl]
//  [RULE_DEF] https://github.com/boostorg/spirit/issues/643#issuecomment-773308349
//Method:
//  In contrast to spirit's use of [ADL] to find the definition of a rule,
//  this uses specializations of a template class, rule_def<RuleId>, to find
//  the **TYPE** of the definition, rule_def<RuleId>::type.
//  
//  This was the method outlined in [RULE_DEF]. 
//=======================
constexpr unsigned max_calls=10;
#include <iostream>
namespace x3
  {
    template
    < unsigned RuleId
    >
    struct
  rule
    {
        template
        < template<unsigned>typename RuleDef
        >
        bool 
      parse
        ( unsigned calls=0
        )const
        {   
          std::cout<<"rule<"<<RuleId<<">::parse:calls="<<calls<<".\n";
          if(calls>max_calls) return true;
          using rhs_type=typename RuleDef<RuleId>::type;
          rhs_type rhs_valu;
          return rhs_valu.template parse<RuleDef>(calls+1);
        }
    };
  }//x3 namespace
#define BOOST_SPIRIT_RULE_DEF \
  template<unsigned RuleId>   \
  struct                      \
rule_def                      \
  ;                           \
/***/  
#define BOOST_SPIRIT_DEFINE(RuleId,RuleDef) \
  template<>                                \
  struct                                    \
rule_def                                    \
  < RuleId>                                 \
  {                                         \
    using type = decltype(RuleDef);         \
  };                                        \
/***/
  namespace
unused_attr
  {
  using namespace x3;
  //{{local struct id's 
  //  Purpose:
  //    These enums replace the RuleId's in [rule_separate_tu]
  //    NOTE, no ADL is needed, as in Spirit, to find the corresponding definition.
  //    instead, the template class, rule_def<RuleId>::type, 
  //    generated by the BOOST_SPIRIT_DEFINE's below, finds the corresponding definition.
  enum:unsigned{id0,id1,id2};
  //}}local struct id's 
  //{{analog in [rule_separate_tu]
  //  are the declarations such as:
  //    using skipper_type = x3::rule<class skipper_r>;
  //    const skipper_type skipper;
  //  within:
  //    rule_separate_tu_grammar.hpp
  constexpr rule<id0> rule0;
  constexpr rule<id1> rule1;
  constexpr rule<id2> rule2;
  //}}analog in [rule_separate_tu]
  //{{analog in [rule_separate_tu]
  //  are the declarations such as:
  //    const auto skipper_def = x3::lit('*');
  //  within:
  //    rule_separate_tu_grammar.cpp
  constexpr auto rule0_def=rule1;
  constexpr auto rule1_def=rule2;
  constexpr auto rule2_def=rule0;
  //}}analog in [rule_separate_tu] 
  //{{analog in [rule_separate_tu] 
  //  is the invocations of BOOST_SPIRIT_DEFINE
  //  within:
  //    rule_separate_tu_grammar.cpp
  BOOST_SPIRIT_RULE_DEF
  BOOST_SPIRIT_DEFINE(id0,rule0_def)
  BOOST_SPIRIT_DEFINE(id1,rule1_def)
//#define DEFINE_RULE2
#ifdef DEFINE_RULE2
  BOOST_SPIRIT_DEFINE(id2,rule2_def)
#endif  
  //}}analog in [rule_separate_tu] 
  }
  int 
main()
  { unused_attr::rule0.parse<unused_attr::rule_def>(0);
    return 0;
  }
Kojoley commented 3 years ago

I am not a fan of removing something without even trying to fix it first. Especially when it will cause backward compatibility issues, and in this case we cannot remove it because there is no alternatives to it (immediate rules could be, and certainly are, used instead of as/attr_cast directives). The issue would be much more helpful if it had a reproducer, I assume this is what you are talking about https://stackoverflow.com/questions/37230653/compile-times-with-boost-spirit-x3/37328539#37328539, and if so I might have a fix for that.

cppljevans commented 3 years ago

I am not a fan of removing something without even trying to fix it first. Especially when it will cause backward compatibility issues, and in this case we cannot remove it because there is no alternatives to it (immediate rules could be, and certainly are, used instead of as/attr_cast directives).

Could you please provide example?

The issue would be much more helpful if it had a reproducer, I assume this is what you are talking about https://stackoverflow.com/questions/37230653/compile-times-with-boost-spirit-x3/37328539#37328539, and if so I might have a fix for that.

Yes, that would be a real world test case; however, there's a simple way to generate any depth of recursion using this code:

//Purpose:
//  Generate grammar for benchmark.
//
#include <iostream>
#include <fstream>
#include <iomanip>
struct osi//indented out stream.
{
private:
  std::ostream& _os;
  int _indent;//current indentation.
  static unsigned const margin=2;
public:
  osi(std::ostream& os):_os(os),_indent(0){}
  std::ostream& os(){ return _os;}
  std::ostream& operator()(){ _os<<std::setw(_indent)<<""; return _os;
    }
  std::ostream& operator++(){ _indent+=margin; return _os;}
  std::ostream& operator--(){ _indent-=margin; return _os;}
};
struct osb
  //indented out stream block.
  //CTOR indents
  //DTOR dedents
{
private:
  osi& _os;
public:
  osb(osi& os):_os(os)
  { ++_os;}
  ~osb()
  { --_os;}
};
#include <string>

#define DEFAULT_NUM_RULES 11
std::string r_(int ndx_rule)
{
  return std::string("r_")+std::to_string(ndx_rule);
}
std::string d_(int ndx_rule)
{
  return r_(ndx_rule)+std::string("_def");
}
enum rref_enum
{ rul
, def
};
template< rref_enum Rref>
std::string r_lhs(int ndx_rule)
{
  return std::string("r_")+std::to_string(ndx_rule);
}
template<>
std::string r_lhs<def>(int ndx_rule)
{
  return std::string("auto const ")+d_(ndx_rule)+"="+r_(ndx_rule);
}
template< rref_enum Rref>
std::string r_rhs(int ndx_rule)
{
  return std::string("r_")+std::to_string(ndx_rule);
}
template<>
std::string r_rhs< def>(int ndx_rule)
{
  return d_(ndx_rule);
}
std::string op(int ndx_rule)
{
  return std::string("string_(\"")+std::to_string(ndx_rule)+"\")";
}
template< rref_enum Rref>
std::string rhs(int ndx_rule,int num_rules)
{
  if(ndx_rule)
      return r_rhs<Rref>(ndx_rule-1)+">>*("+op(ndx_rule)+">>"+r_rhs<Rref>(ndx_rule-1)+")";
  else
    return "char_('_')|char_('[')>>"+r_(num_rules-1)+">>char_(']')";
}
void rule_decls( osi&gfi, int num_rules)
{
  for(int i_rule=0; i_rule<num_rules; ++i_rule)
    gfi()<<"rule<id<"<<i_rule<<">> const "<<r_(i_rule)<<";\n";
}
int main(int argc, char *argv[])
{
  int num_rules=DEFAULT_NUM_RULES;
#if 0
  std::cout<<":argc="<<argc<<"\n";
  for(int i_arg; i_arg<argc; ++i_arg)
    std::cout<<"argv["<<i_arg<<"]="<<argv[i_arg]<<"\n";
#endif    
  if(argc>1) num_rules=std::stoi(argv[1]);
  std::cout<<argv[0]<<":num_rules="<<num_rules<<"\n";
  int i_rule;
  std::ofstream gf;
  gf.open("gram_file.hpp");
  {
    osi gfi(gf);
    osb gfb(gfi);
    { osb gfb(gfi);
      rule_decls( gfi, num_rules);
    }
    { osb gfb(gfi);
      for(i_rule=0;i_rule<num_rules; ++i_rule)
        gfi()<<r_lhs<def>(i_rule)+"="+rhs<def>(i_rule,num_rules)+";\n";
      gfi()<<"auto const& start="<<d_(num_rules-1)<<";\n";
    }
  }
  gf.close();
  return 0;
}

This generates, when arg is 3, output in gram_file.hpp:

    rule<id<0>> const r_0;
    rule<id<1>> const r_1;
    rule<id<2>> const r_2;
    auto const r_0_def=r_0=char_('_')|char_('[')>>r_2>>char_(']');
    auto const r_1_def=r_1=r_0_def>>*(string_("1")>>r_0_def);
    auto const r_2_def=r_2=r_1_def>>*(string_("2")>>r_1_def);
    auto const& start=r_2_def;

which, when #included in:

#include <boost/spirit/home/x3.hpp>
using namespace boost::spirit::x3;
#include <string>
auto string_(char const* s){ return standard::string(s);}
template<unsigned ID>
struct id;
#include "gram_file.hpp"
#include <iostream>
int main()
{
  std::string input="_2_1_2_2_";
  using iter_t=std::string::iterator;
  iter_t first=input.begin();  
  iter_t const last=input.end();
  bool result=parse(first,last,start);
  std::cout<<"result="<<result<<";\n";
  return 0;
}

compiles and runs OK.

The basic pattern of the grammar is just like arithmetic expressions where:

You've actually seen something similar to this before back in 2017.

Kojoley commented 3 years ago

Could you please provide example?

... Yes, that would be a real world test case;

That was not convincing? I am too lazy to search links, but here is another one https://stackoverflow.com/questions/43526708/optimizing-the-grammar

however, there's a simple way to generate any depth of recursion using this code:

Your example is too synthetic, why are there all these rules if they are not used except for the single one and r_*_def used directly instead. I understand that you can fix your generator, but it will not change much because IIUC the feature was implemented to allow writing inline self recursing parser, and that is not what you are doing. I do care that the feature might be misused by users, but unless it is a common pitfall I care about backward compatibility much more.