Refactor: template parser answer list support

chrismclarke commented 7 months ago

PR Checklist

[x] PR title descriptive (can be used in release notes)

TODO

[x] If merging after #2157 will need to change sentry import accordingly

Description

Update parser to parse answer_lists (previously would happen at runtime)
Add spec tests for template parser
Improve methods for row name auto-generation
Update parser cache version (will force reprocess all sheets next sync)

Author Notes

Been a while since taking a closer look at the template parser. A few things noted while working on the PR:

Notes

answer_lists need to be defined as local variables with name including _list in them somewhere. Not sure if this was previously documented anywhere, I haven't changed the handling just noted that it is the case. See more info about special row name handling in questions section below
Following the change all content repos should see git diffs as answer_lists populated fully (not just string partial representation)
One limitation of parsing answer lists in the compiler is harder to determine whether a named list is actually a list or a reference to something else. E.g. {name: "my_answer_list", value: "@local.some_other_answer_list"}. In order to handle this case I've added a check explicitly for ; when specifying template lists. That may break content if a sheet defined a list with only one value and forgot to include, e.g. answer_list local variable value defined as:
```
"name: value_1 | text: my text"
```
as opposed to
```
"name: value_1 | text: my_text; name: value_2 | text: my text 2"
```
There is currently a default to handle rows without a name which is just setting the name as the row type. This vaguely works if there is just one text or image type row in a template and nothing needs to modify or refer back, however felt quite fragile in general. I've added a minor update so that by default we now also add a suffix to the name derived from the row number being processed, e.g. if the sheet has unnamed text on row 8 the generated name should be text_8. Is this something useful, or would it be better to remove entirely and replace with some quality-control check to ensure all rows have an author-defined name.
With the changes above there may be small content diffs when syncing repos if any do contain non-named rows

Questions

It appears that parameter lists only ever support simple key-value pairs during parsing (e.g. answer_list: @local.some_list), with additional support for key-only properties (e.g. input_allowed would be implied as input_allowed: 'true'). Do we have any runtime cases of more complex use cases that we might want to reconsider evaluating in the parser? E.g. parameter lists where the value isn't just a simple string but more complex object?
There seems support template_group row type, is this something we still use? (don't have strong memory of it)
We're currently passing the exclude_from_translation property explicitly with every row entry. Given that this info isn't ever used at runtime it might be worth considering exporting this separately (e.g. a single json file of excluded translations). Possibly a small follow-up issue (might be best to coordinate with translation workflow)
Templates have special handling for any variables named _list (convert to array) or _collection (convert to json key-value pairs). This is on top of specific handling for columns ending _list (e.g. condition_list), _collection, _date, or specifically action_list. Is this something documented somewhere? Do we know if things like local name and/or column name _collection handling is still used?

Dev Notes

I've left the runtime answer list item parse method in the code for now for any cases where the parsing may not have been done correctly in advance (I don't think it should happen but probably better to be safe for now.. maybe if read in from a global or cell within a data_list (as opposed to full data list row)). I've also added an explicit call to sentry to detect if the code is executed to track whether we can safely remove in the future

I've done my best to add spec tests to cover all the cases covered explicitly within the template parser methods in their current state. I'm not sure if it might make sense to further try to refine/improve as follow-up (e.g. remove template_group handling which I think must be deprecated by now). Spec tests can be run via:

yarn workspace scripts test

The code also includes a placeholder method to run specific QC checks on templates, however I stopped short of adding as the PR was already growing quickly. Might be worth reviewing in the future possible checks worth having (e.g. I had planned to include one to ensure all parameter lists that included answer lists pointed at local variables with correct _list in name)

Git Issues

Closes #2210

Screenshots/Videos

Tests added - see full test details in spec file

chrismclarke commented 6 months ago

It seems that the local variable needs to end in _list, containing it as a substring is not enough: A variable answer_list_1 is not parsed differently, but answer_1_list is.

@esmeetewinkel Thanks for looking through all these cases

You're right, I can see the default parser only converts special fields (here meaning sheet columns) depending on their name ending

  if (typeof value === "string") {
          if (field.endsWith("_list")) {
            this.row[field] = parseAppDataListString(value);
          }
          if (field.endsWith("_collection")) {
            this.row[field] = parseAppDataCollectionString(this.row[field]);
          }
          if (field.endsWith("action_list")) {
            this.row[field] = this.row[field]
              .map((actionString) => parseAppDataActionString(actionString))
              .filter((action) => action !== null);
          }
        }
        // convert google/excel number dates to dates (https://stackoverflow.com/questions/16229494/converting-excel-date-serial-number-to-date-using-javascript)
        if (field.endsWith("_date")) {
          if (typeof this.row[field] === "number") {
            this.row[field] = parseAppDateValue(this.row[field]);
          }
        }

It is then only at a later stage that the template parser handles based on row names, and supports the text included anywhere in the name

 if (row.name?.includes("_list")) {
        row.value = this.parseTemplateList(row.value);
      }
      if (row.name?.includes("_collection") && row.value && typeof row.value === "string") {
        row.value = parseAppDataCollectionString(row.value);
      }

So that means right now for any data_list the column must end in _list to be processed as expected, and for any template that defines inline then the row must contain _list somewhere in the name.

Do you think this distinction makes sense/is sensible, or better to also allow data_lists to convert if the column has _list anywhere in the name? I don't think it was considered to carefully when first implemented so open to suggestion

chrismclarke commented 6 months ago

This also applies to data_items. I'm understanding that a begin_data_items row "expects" a name column value (populating it with a unique default when missing). This is a bit confusing, since a begin_items row breaks when there is anything in the name column.

I'm not 100% sure if it breaks or not, because I think there's still fixes applied in #2220 that would need merge to work properly. So it might be worth retesting after - I actually think assigning a name to data items should be a requirement because otherwise it would be possible to have 2 sets of data items with children that have duplicate name references, but can see post #2220

I did pick up one extra issue while testing on the debug repo though related to navigation-bar component which I've pushed a fix for

chrismclarke commented 6 months ago

2. When rows sit nested in a display group the proposed default row names are numbered within the scope of the display group, which doesn't make sense to me. When referring to these rows in a parent template the display group is invisible (i.e. as opposed to begin_nested_template there is no begin_nested_display_group) so we need the default row names to be unique within the scope of the template, I'd say.

Yeah it's a bit of a tricky one. I (think) I can understand what you mean - when referring to the rows from within the same template they can be identified independent of the display group container (this is the case and what you mean, right??) - and so makes sense enforcing uniqueness across the template . The challenge here is that when it comes to the rendering process they are still rendered as child rows of the display group container (which handles the row layout logic).

So I think the anomaly here is being able to refer to the rows directly, say nested_text_1, instead of as my_display_group.nested_text_1. I think it probably wouldn't be very easy to adapt the case for display groups without diving deeper into the overall templating system, so might just be a limitation/bug/feature to be aware of for now...

I'm not sure whether it would be worth the time or not to create an example sheet with the sole purpose of reviewing generated row names or not... It's tricky added all these cases to the test specs as depends somewhat on different levels of processing (e.g. items and data_items apply certain manipulations at runtime so they can iterate over dynamic data while display_groups are state and managed more at compile time)

esmeetewinkel commented 6 months ago

So that means right now for any data_list the column must end in _list to be processed as expected, and for any template that defines inline then the row must contain _list somewhere in the name.

Do you think this distinction makes sense/is sensible, or better to also allow data_lists to convert if the column has _list anywhere in the name? I don't think it was considered to carefully when first implemented so open to suggestion

Okay, thanks for checking and clarifying. I can see reasons for either convention (it could be handy for list_1, list_2, etc still to be recognised as lists, but it would be annoying if listenor specialist would be recognised as lists). I'd definitely be in favour of handling the same way for sheet columns and template rows.

esmeetewinkel commented 6 months ago

When rows sit nested in a display group the proposed default row names are numbered within the scope of the display group, which doesn't make sense to me. When referring to these rows in a parent template the display group is invisible (i.e. as opposed to begin_nested_template there is no begin_nested_display_group) so we need the default row names to be unique within the scope of the template, I'd say.

Yeah it's a bit of a tricky one. I (think) I can understand what you mean - when referring to the rows from within the same template they can be identified independent of the display group container (this is the case and what you mean, right??) - and so makes sense enforcing uniqueness across the template . The challenge here is that when it comes to the rendering process they are still rendered as child rows of the display group container (which handles the row layout logic).

So I think the anomaly here is being able to refer to the rows directly, say nested_text_1, instead of as my_display_group.nested_text_1. I think it probably wouldn't be very easy to adapt the case for display groups without diving deeper into the overall templating system, so might just be a limitation/bug/feature to be aware of for now...

I'm not sure whether it would be worth the time or not to create an example sheet with the sole purpose of reviewing generated row names or not... It's tricky added all these cases to the test specs as depends somewhat on different levels of processing (e.g. items and data_items apply certain manipulations at runtime so they can iterate over dynamic data while display_groups are state and managed more at compile time)

Yes, that's exactly the case I mean. I'm happy to assume as a limitation for now. I have anyway been authoring display group row names as if they had to be unique within the scope of the template, so the new parsing won't affect such templates.

chrismclarke commented 6 months ago

So that means right now for any data_list the column must end in _list to be processed as expected, and for any template that defines inline then the row must contain _list somewhere in the name. Do you think this distinction makes sense/is sensible, or better to also allow data_lists to convert if the column has _list anywhere in the name? I don't think it was considered to carefully when first implemented so open to suggestion

Okay, thanks for checking and clarifying. I can see reasons for either convention (it could be handy for list_1, list_2, etc still to be recognised as lists, but it would be annoying if listenor specialist would be recognised as lists). I'd definitely be in favour of handling the same way for sheet columns and template rows.

It would still require an underscore, so listen wouldn't get marked as a list but you're right you would get false-positives with something like example_listen.

So how about we try to combine and change both checks as:

ends with _list or includes _list_

So something like example_listen_list1 would not parse but example_listen_list or example_list_1 would

chrismclarke commented 6 months ago

@esmeetewinkel I've updated both default and template-specific parsers to behave in the same way so that any sheet columns or template row names ending _list or including _list_ will be converted to array, and likewise ending _collection or including _collection_ will be converted to json object. So hopefully this should be a bit more consistent moving forwards.

Syncing the debug repo I took a quick look through the diffs and manually inspected the templates that look most affected by the changes (e.g. debug_answer_list_partial, feat_footer, feature_set_theme, comp_radio_group and a couple others), and all looking as-expected to me. I'm not sure if the test-screenshots action has been reconfigured since migrating to content repos, but if so it might be worth doing a cross-check of debug content repo before and after (and if not making a follow-up issue to fix)

I could see feat_footer wasn't displaying any visible content, however I think that is just because of the navigation_bar styling (can see both the icons and nav bar background set same as background) so I'm assuming this is just an old template and not a bug (??).

Otherwise hopefully should finally be good to go? (I also checked the issue mentioned in original description)

jfmcquade commented 6 months ago

I could see feat_footer wasn't displaying any visible content, however I think that is just because of the navigation_bar styling (can see both the icons and nav bar background set same as background) so I'm assuming this is just an old template and not a bug (??).

I got excited because I thought this might be related to #2243, which I'm struggling to resolve, and offer a new angle. But can confirm that this is not related and also not a new bug.

feat_footer is specifically designed to be a template in the main app footer (supplied to APP_FOOTER_DEFAULTS.templateName). As the main app footer is currently hardcoded to have a primary coloured background, using white icons works, but the template is not designed to be viewed independently. We could expose some more customisation for the footer, but I don't think this is a bug as such.

github-actions[bot] commented 5 months ago

Visual Test Summary new : 0 different : 6 same : 331

Largest Differences 1 | 19.5% | example_lang_template_2 2 | 0.4 % | user_actions 3 | 0.2 % | example_calc 4 | 0.2 % | example_calc_date 5 | 0.1 % | debug_conditional_texts 6 | 0 % | feature_parent_point_box

Download Link https://nightly.link/IDEMSInternational/parenting-app-ui/actions/runs/8738684195

Run Details https://github.com/IDEMSInternational/parenting-app-ui/actions/runs/8738684195

github-actions[bot] commented 5 months ago

Visit the preview URL for this PR (updated for commit 8752f73):

https://plh-teens-app1--pr2211-refactor-answer-list-imqmorht.web.app

_{(expires Thu, 02 May 2024 13:24:17 GMT)}

_{🔥 via Firebase Hosting GitHub Action 🌎}

_{Sign: e4c0bab6b08dd290fbf002fd6e07987fa4b5fce1}

chrismclarke commented 5 months ago

One issue I encountered: it seems that bumping the cacheVersion of the flowParser is not enough to prompt all sheets to be re-parsed on sync, only the handful of sheets that had new author edits were re-parsed for me. I tried bumping the number myself, which didn't seem to have an effect. What did work was bumping the cacheVersion number as specified in the base processor. After that, syncing makes changes to over a hundred parsed sheets, as expected. I believe this is because the cacheVersion property is used in the BaseProcessor constructor, so the derived class's property does not override it, see this discussion.

Thanks for catching, I've pushed 0089892 which should now force the inheritance correctly by passing to the constructor. Let me know if now working as expected.

Once happy this should be good to merge, I accidentally re-requested review from @esmeetewinkel however I don't think there should be any additional changes to look at

IDEMSInternational / open-app-builder