andre-simon / highlight

Source code to formatted text converter
GNU General Public License v3.0
207 stars 58 forks source link

Some Default LangDef Definitions Don't Show-up in Verbose Debugging #33

Open tajmone opened 6 years ago

tajmone commented 6 years ago

Hi Andre,

it came up in a previous issue that some default values for language definitions don't show up when debugging with verbose:

I've tried defining Escape as an empty string, but this caused bad parsing behavior. It seems that Highlight has some internal default definition of Escape string, but it doesn't show up in the --verbose output.

There is a default regex (will check why it is not displayed).

Today I was doing some tests with a bare-bone langDef (just Description and a Keywords entry) to actually check which elements have defaults and what these are. As it came out, the only visible defaults I can see via --versose are these two:

I somehow have the impression that the debug is not showing all the defaults --- Escape, as mentioned in the quote, but possibly others.

Also, boolean definitions don't show up (IgnoreCase, EnableIndentation, etc.). Are these hidden because false or null by default?

The documentation only mentions two default settings:

Global variables:

The following variables are available within a language definition:

HL_LANG_DIR: path of language definition directory (use with Lua dofile function)

Identifiers: Default regex for identifiers Digits: Default regex for numbers

... so it might be correct. But, as mentioned in issue #23, even if I don't define Escapes Highlight seems to catch C-style escape sequences nonetheless (so I guess there is a hidden default string somewhere).

NOTE: This came up while I was working on PR #34

tajmone commented 6 years ago

Identifiers Defaults in HL Source

I've found the references in HL source code:

//default expressions, can be overridden by syntax definition
const string SyntaxReader::REGEX_IDENTIFIER =
    "[a-zA-Z_]\\w*";

const string SyntaxReader::REGEX_NUMBER =
    "(?:0x|0X)[0-9a-fA-F]+|\\d*[\\.]?\\d+(?:[eE][\\-\\+]\\d+)?[lLuU]*";

const string SyntaxReader::REGEX_ESCSEQ =
    "\\\\u[[:xdigit:]]{4}|\\\\\\d{3}|\\\\x[[:xdigit:]]{2}|\\\\[ntvbrfa\\\\\\?'\"]";

And this is the code that handles the debugging info:

void HLCmdLineApp::printDebugInfo ( const highlight::SyntaxReader *lang,
                                    const string & langDefPath )
{
    if (!lang) return;
    cerr << "\nLoading language definition:\n" << langDefPath;
    cerr << "\n\nDescription: " << lang->getDescription();

    Diluculum::LuaState* luaState=lang->getLuaState();
    if (luaState) {
        cerr << "\n\nLUA GLOBALS:\n" ;
        Diluculum::LuaValueMap::iterator it;
        Diluculum::LuaValueMap glob =luaState->globals();
        for(it = glob.begin(); it != glob.end(); it++) {
            Diluculum::LuaValue first = it->first;
            Diluculum::LuaValue second = it->second;
            std::cerr << first.asString()<<": ";
            switch (second.type()) {
            case  LUA_TSTRING:
                cerr << "string [ "<<second.asString()<<" ]";
                break;
            case  LUA_TNUMBER:
                cerr << "number [ "<<second.asNumber()<<" ]";
                break;
            case  LUA_TBOOLEAN:
                cerr << "boolean [ "<<second.asBoolean()<<" ]";
                break;
            default:
                cerr << second.typeName();
            }
            cerr << endl;
        }

    }

It looks like the iteration is missing out the Escape definition.

Back to syntaxreader.cpp (LL 91–), we notice that Escape is not among the initalized globals (unlike Identifiers and Digits, which can be seem at lines 100–101):

void  SyntaxReader::initLuaState(Diluculum::LuaState& ls, const string& langDefPath, const string& pluginParameter, OutputType type )
{
    // initialize Lua state with variables which can be used within scripts
    string::size_type Pos = langDefPath.find_last_of ( Platform::pathSeparator );
    ls["HL_LANG_DIR"] =langDefPath.substr ( 0, Pos+1 );

    ls["HL_INPUT_FILE"] = ls["HL_PLUGIN_PARAM"] = pluginParameter;
    ls["HL_OUTPUT"] = type;

    ls["Identifiers"]=REGEX_IDENTIFIER;
    ls["Digits"]=REGEX_NUMBER;

    //nitialize environment for hook functions
    ls["HL_STANDARD"]=STANDARD;

Maybe this is the reason why Escape's default value doesn't show up in --verbose debugging?

I've also found an explicit reference to --verbose output at LL 312–:

            string escRegex;
            if (ls["Strings"]["Escape"].value()==Diluculum::Nil){
                escRegex=REGEX_ESCSEQ;
                ls["Strings[Escape]"] = escRegex; //for --verbose output
            } else {
                escRegex=ls["Strings"]["Escape"].value().asString();
            }
            regex.push_back ( new RegexElement ( ESC_CHAR,ESC_CHAR_END, StringTools::trim(escRegex), 0, -1 ) );

... but somehow this isn't happening. (not seen as global?)

andre-simon commented 6 years ago

The Escape sequence default is outputted as Strings[Escape]. The code in printDebugInfo does not iterate arrays like Strings, so the value was not shown without the quick fix. It is correct that default (false) values are also not shown, as they are not defined as explicit Lua values. To show them all they need to be added as Lua state values.