amplify-education / python-hcl2

MIT License
255 stars 56 forks source link

Grammar Breaks between versions for TF and HCL specifically #163

Open DevGumbo opened 5 months ago

DevGumbo commented 5 months ago

I have been trying to create a tool that will crawl all the hcl in our terragrunt directories and ran into an interesting issue. When i update to version 4.3.3 the hcl parser throws an error as follows.

I also understand that i am doing odd things with a change out of the question mark. That was on my journey to figure out what was going on with the parser.

terragrunt.hcl: Unexpected token Token('__ANON_3', 'protocol') at line 31, column 7.
Expected one of: 
        * MORETHAN
        * __ANON_4
        * RBRACE
        * __ANON_1
        * __ANON_9
        * LESSTHAN
        * PERCENT
        * STAR
        * SLASH
        * __ANON_7
        * __ANON_0
        * COMMA
        * QMARK
        * __ANON_5
        * __ANON_6
        * __ANON_8
        * __ANON_2
        * MINUS
        * PLUS
    This is from code like
    ```hcl
    locals {

common_vars = read_terragrunt_config(find_in_parent_folders("index1.hcl")) account_vars = read_terragrunt_config(find_in_parent_folders("index2.hcl")) region_vars = read_terragrunt_config(find_in_parent_folders("index3.hcl")) vpc_vars = read_terragrunt_config(find_in_parent_folders("index4.hcl"))

defaults = local.common_vars.locals.defaults

Get the rules from defaults.yaml

default_inbound_nacl_rules = { for index, rule in local.defaults.vpc_app_inbound_nacl_rules : "defaultrule${index}" => { client_cidr_block = rule["client_cidr_block"] rule_number = 10 + index protocol = rule["protocol"] from_port = rule["from_port"] to_port = rule["to_port"] icmp_code = rule["icmp_code"] icmp_type = rule["icmp_type"] } } default_outbound_nacl_rules = { for index, rule in local.defaults.vpc_app_outbound_nacl_rules : "defaultrule${index}" => { client_cidr_block = rule["client_cidr_block"] rule_number = 10 + index protocol = rule["protocol"] from_port = rule["from_port"] to_port = rule["to_port"] icmp_code = rule["icmp_code"] icmp_type = rule["icmp_type"] } }

}

however i am able to run terraform parser effeciently.

When i downgrade to 4.3.0 to make the hcl work, i get this error for the terraform

```shell
test_vpc.tf after replacement: Unexpected token Token('DECIMAL', '1') at line 129, column 71.
Expected one of: 
        * /[a-zA-Z_][a-zA-Z0-9_-]*/
        * EQUAL
        * LBRACE
        * STRING_LIT
Previous tokens: [Token('__ANON_3', '_QUESTION_MARK_')]

terraform is as follows

locals {
  destination_route_tables = compact(concat(
    [module.vpc.public_subnet_route_table_id],
    module.vpc.private_app_subnet_route_table_ids,
    module.vpc.private_persistence_route_table_ids,
  ))

  # Ideally, this will be length(local.destination_route_tables) but terraform has restrictions around count depending
  # on resources that don't exist, so we have to rely on summation logic using information that is available at plan
  # time. Note that this requires some knowledge of route table logic in the vpc module.
  # TODO: expose additional outputs in vpc module to help simplify this.
  num_destination_route_tables = (
    local.num_public_route_tables + local.num_private_app_route_tables + local.num_private_persistence_route_tables
  )
  # 1 route table for all public subnets
  LINE 129 >> num_public_route_tables = var.create_public_subnets ? 1 : 0
  # 1 route table for each AZ for private app and private persistence subnet tiers
  num_private_app_route_tables         = var.create_private_app_subnets ? module.vpc.num_availability_zones : 0
  num_private_persistence_route_tables = var.create_private_persistence_subnets ? module.vpc.num_availability_zones : 0

}

Here is the code for the tf parser

import hcl2
import os
import lark

def parse_tf_file(file_path):
    try:
        with open(file_path, 'r') as file:
            content = file.read()
        return hcl2.loads(content)
    except lark.exceptions.UnexpectedToken as e:
        if '?' in content:
            # Replace '?' with a placeholder
            content = content.replace('?', '_QUESTION_MARK_')
            try:
                return hcl2.loads(content)
            except Exception as inner_e:
                print(f"Error parsing Terraform file {file_path} after replacement: {inner_e}")
                return None
        else:
            print(f"Error parsing Terraform file {file_path}: {e}")
            return None
    except Exception as e:
        print(f"Error parsing Terraform file {file_path}: {e}")
        return None

def main():
    tf_file_path = './test_vpc.tf'  # Change this to the path of your .tf file
    parsed_data = parse_tf_file(tf_file_path)
    if parsed_data:
        print("Parsed data successfully:")
        print(parsed_data)
    else:
        print("Failed to parse the file.")

if __name__ == "__main__":
    main()

Here is the code for the hcl parser

import hcl2
import lark

def parse_hcl_file(file_path):
    try:
        with open(file_path, 'r') as file:
            content = file.read()
        return hcl2.loads(content)
    except lark.exceptions.UnexpectedToken as e:
        print(f"Error parsing HCL file {file_path}: {e}")
        return None
    except Exception as e:
        print(f"Error parsing HCL file {file_path}: {e}")
        return None

def main():
    hcl_file_path = 'erragrunt.hcl'  # Change this to the path of your .hcl file
    parsed_data = parse_hcl_file(hcl_file_path)
    if parsed_data:
        print("Parsed data successfully:")
        print(parsed_data)
    else:
        print("Failed to parse the file.")

if __name__ == "__main__":
    main()
DevGumbo commented 5 months ago

so i am comparing the lark file and going back and forth between the functions. If i change line 16 in hte larke file from

binary_op : expression binary_term new_line_or_comment?

to

binary_op : expression binary_term

the terraform specific parser will work but the hcl specific parsing does not.

I leave both those settings as they are on the newest version, the HCL proper parser will work but the terrafrom parser will break.

DevGumbo commented 5 months ago

so in order to make it work for me, i just forked the version and instantiated different modules , one for tf and one for hcl.

This get me where i want to go and you guys are doing great work. There is something unique in the lark file on lines

start : body body : (new_line_or_comment? (attribute | block))* new_line_or_comment? attribute : identifier "=" expression

LINE 3/4 changes

block : identifier (identifier | STRING_LIT)* new_line_or_comment? "{" body "}" ## << 4.3.3 WORKS FOR HCL PROPER*##
OR block : identifier (identifier | STRING_LIT)* "{" body "}" *## <<4.3.0 DOESNT WORK FOR TF PROPER FILES*## 

new_line_and_or_comma: new_line_or_comment | "," | "," new_line_or_comment new_line_or_comment: ( /\n/ | /#.\n/ | /\/\/.\n/ )+

identifier : /[a-zA-Z][a-zA-Z0-9-]*/

?expression : expr_term | operation | conditional

conditional : expression "?" new_line_or_comment? expression new_line_or_comment? ":" new_line_or_comment? expression

?operation : unary_op | binary_op !unary_op : ("-" | "!") expr_term

Line 16/15 changes

binary_op : expression binary_term new_line_or_comment?## << 4.3.3 WORKS FOR HCL PROPER##
OR binary_op : expression binary_term ## <<4.3.0 DOESNT WORK FOR  TF PROPER FILES##

!binary_operator : "==" | "!=" | "<" | ">" | "<=" | ">=" | "-" | "*" | "/" | "%" | "&&" | "||" | "+" binary_term : binary_operator new_line_or_comment? expression

expr_term : "(" new_line_or_comment? expression new_line_or_comment? ")" | float_lit | int_lit | STRING_LIT | tuple | object | function_call | index_expr_term | get_attr_expr_term | identifier | heredoc_template | heredoc_template_trim | attr_splat_expr_term | full_splat_expr_term | for_tuple_expr | for_object_expr

STRING_LIT : "\"" (STRING_CHARS | INTERPOLATION)* "\"" STRING_CHARS : /(?:(?!\${)([^"\]|\.))+/+ // any character except '"" unless inside a interpolation string NESTED_INTERPOLATION : "${" /[^}]+/ "}" INTERPOLATION : "${" (/(?:(?!\${)([^}]))+/ | NESTED_INTERPOLATION)+ "}"

int_lit : DECIMAL+ !float_lit: DECIMAL+ "." DECIMAL+ (EXP_MARK DECIMAL+)? | DECIMAL+ ("." DECIMAL+)? EXP_MARK DECIMAL+ DECIMAL : "0".."9" EXP_MARK : ("e" | "E") ("+" | "-")?

tuple : "[" (new_line_or_comment expression new_line_or_comment ",") (new_line_or_comment expression)? new_line_or_comment "]" object : "{" new_line_or_comment? (object_elem (new_line_and_or_comma object_elem ) new_line_and_or_comma?)? "}" object_elem : (identifier | expression) ("=" | ":") expression

heredoctemplate : /<<(?P[a-zA-Z][a-zA-Z0-9.-]+)\n(?:.|\n)?(?P=heredoc)/ heredoc_template_trim : /<<-(?P[a-zA-Z][a-zA-Z0-9._-]+)\n(?:.|\n)?(?P=heredoc_trim)/

function_call : identifier "(" new_line_or_comment? arguments? new_line_or_comment? ")" arguments : (expression (new_line_or_comment "," new_line_or_comment expression) ("," | "...")? new_line_or_comment)

index_expr_term : expr_term index get_attr_expr_term : expr_term get_attr attr_splat_expr_term : expr_term attr_splat full_splat_expr_term : expr_term full_splat index : "[" new_line_or_comment? expression new_line_or_comment? "]" | "." DECIMAL+ get_attr : "." identifier attr_splat : "." get_attr full_splat : "[]" (get_attr | index)

!for_tuple_expr : "[" new_line_or_comment? for_intro new_line_or_comment? expression new_line_or_comment? for_cond? new_line_or_comment? "]" !for_object_expr : "{" new_line_or_comment? for_intro new_line_or_comment? expression "=>" new_line_or_comment? expression "..."? new_line_or_comment? for_cond? new_line_or_comment? "}" !for_intro : "for" new_line_or_comment? identifier ("," identifier new_line_or_comment?)? new_line_or_comment? "in" new_line_or_comment? expression new_line_or_comment? ":" new_line_or_comment? !for_cond : "if" new_line_or_comment? expression

%ignore /[ \t]+/ %ignore /\/*(.|\n)?(\\/)/

kkozik-amplify commented 5 months ago

What do you mean by terraform specific parser and hcl specific parser? The library uses only one parser: https://github.com/amplify-education/python-hcl2/blob/fc8805e8ee0a72e6c4db9feea711ead4391171fa/hcl2/parser.py#L10-L16 .tf files are written in HCL2 language.

RPitt commented 4 months ago

Just to note that we also recently ran into problems parsing terraform projects after upgrading from 4.3.0 to 4.3.4

The problem can be reproduced in 4.3.4 using a simple hcl format data file like so:

somedata = {
  number = 8 * 1024
  number2 = 4
}

which results in: Unexpected token Token('__ANON_3', 'number2') at line 3, column 3.

So presumably the issue is that recent changes broke the parsing of arithmetic expressions such as 8 * 1024

update: it's not just arithmetic expressions, see link added below

DevGumbo commented 4 months ago

What do you mean by terraform specific parser and hcl specific parser? The library uses only one parser:

https://github.com/amplify-education/python-hcl2/blob/fc8805e8ee0a72e6c4db9feea711ead4391171fa/hcl2/parser.py#L10-L16

.tf files are written in HCL2 language.

I mean, that in reading HCL within terraform and its objects breaks on the specified version but will make the terragrunt hybrid HCL work.

so when on on the older versions of this language pack, i am able to work with the terragrunt hybrid HCL/GO componenets with no issue, but the TF HCL has problems with question marks and ternaries.

If if upgrade, the newer version will be able to handle the hybrid terraform HCL better, but now the Terragrunt Hybrid HCL\GO object blocks wont parse and throw the errors stated above.