PLC-lang / rusty

Structured Text Parser and LLVM Frontend
GNU Lesser General Public License v3.0
183 stars 48 forks source link

Find and report recursive data structures (#686) #721

Closed volsa closed 1 year ago

volsa commented 1 year ago

Fixes #686, done in pair-programming sessions with @mhasel

TODOs:

volsa commented 1 year ago

As far as I'm concerned this PR is ready to be reviewed. The only issue currently still present is that recursive data structures will crash the compiler with an stack overflow error despite detecting them simply because we're not aborting at phase 3? I've talked with @99NIMI about this and afaik this will be handled once we refactor error handling.

riederm commented 1 year ago

I like your implementation, it is very sound and the readability and the documentation is awesome. :heart: I quickly scanned over the files and I think I could spot one or two minor issues - can you verify this? detailed review is coming a bit later

TYPE B STRUCT a : ARRAY[0..1] OF A; END_STRUCT END_TYPE



- VAR_IN_OUT and VAR_OUTPUT are pointers
var_in_out variables are internally treated as pointers, so you can have recursions via VAR_IN_OUTs. Is the same true for VAR_OUTPUT? I'm not entirely sure yet - we need to discuss this.
riederm commented 1 year ago

we should discuss the way you use the Cell here. We used it soooo rarely (or do we actually use it elsewhere?), I'm not sure if this is a typical case where a Cell makes things easier? I somehow have the feeling that you're hiding the mutation borrow checker - but as I said - I think I'm a bit old-fashioned when it comes to the smart pointers and cells and stuff. Can't wait to discuss this next week with everybody.

codecov-commenter commented 1 year ago

Codecov Report

Base: 94.44% // Head: 94.40% // Decreases project coverage by -0.05% :warning:

Coverage data is based on head (a908fd4) compared to base (164de1c). Patch coverage: 85.32% of modified lines in pull request are covered.

:mega: This organization is not using Codecov’s GitHub App Integration. We recommend you install it so Codecov can continue to function properly for your repositories. Learn more

Additional details and impacted files ```diff @@ Coverage Diff @@ ## master #721 +/- ## ========================================== - Coverage 94.44% 94.40% -0.05% ========================================== Files 46 47 +1 Lines 16303 16404 +101 ========================================== + Hits 15398 15486 +88 - Misses 905 918 +13 ``` | [Impacted Files](https://codecov.io/gh/PLC-lang/rusty/pull/721?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=PLC-lang) | Coverage Δ | | |---|---|---| | [src/diagnostics.rs](https://codecov.io/gh/PLC-lang/rusty/pull/721?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=PLC-lang#diff-c3JjL2RpYWdub3N0aWNzLnJz) | `82.73% <52.17%> (-0.78%)` | :arrow_down: | | [src/typesystem.rs](https://codecov.io/gh/PLC-lang/rusty/pull/721?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=PLC-lang#diff-c3JjL3R5cGVzeXN0ZW0ucnM=) | `97.29% <63.63%> (-0.52%)` | :arrow_down: | | [src/validation/recursive\_validator.rs](https://codecov.io/gh/PLC-lang/rusty/pull/721?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=PLC-lang#diff-c3JjL3ZhbGlkYXRpb24vcmVjdXJzaXZlX3ZhbGlkYXRvci5ycw==) | `98.61% <98.61%> (ø)` | | | [src/validation.rs](https://codecov.io/gh/PLC-lang/rusty/pull/721?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=PLC-lang#diff-c3JjL3ZhbGlkYXRpb24ucnM=) | `98.69% <100.00%> (+0.01%)` | :arrow_up: | | [src/parser/expressions\_parser.rs](https://codecov.io/gh/PLC-lang/rusty/pull/721?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=PLC-lang#diff-c3JjL3BhcnNlci9leHByZXNzaW9uc19wYXJzZXIucnM=) | `94.52% <0.00%> (+0.15%)` | :arrow_up: | Help us with your feedback. Take ten seconds to tell us [how you rate us](https://about.codecov.io/nps?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=PLC-lang). Have a feature suggestion? [Share it here.](https://app.codecov.io/gh/feedback/?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=PLC-lang)

:umbrella: View full report at Codecov.
:loudspeaker: Do you have feedback about the report comment? Let us know in this issue.

volsa commented 1 year ago

ARRAY introduces a new Type (internally this creates a type TYPE ARRAY_TO_B : ARRAY [0..1] OF A; END_TYPE and re-declare B.a as this new type) which will not be detected by the validator. array edges should be treated as direct edges to the array-target. For Pointers we want the current behavior.

Are we sure arrays are a problem? My current approach for flagging something as a potential recursive datastructure was to compile a code snippet and if it fails at compile-time in a stack-overflow error I'd be certain we want this to be considered and reported. The given snippet doesn't seem to trigger anything.

var_in_out variables are internally treated as pointers, so you can have recursions via VAR_IN_OUTs. Is the same true for VAR_OUTPUT? I'm not entirely sure yet - we need to discuss this.

The following code triggers a stack-overflow so I guess recursions are possible?

FUNCTION_BLOCK A
    VAR_OUTPUT
      b : B;
    END_VAR
END_FUNCTION_BLOCK

FUNCTION_BLOCK B
    VAR_OUTPUT
        a : A;
    END_VAR
END_FUNCTION_BLOCK

we should discuss the way you use the Cell here. We used it soooo rarely (or do we actually use it elsewhere?), I'm not sure if this is a typical case where a Cell makes things easier?

A quick search tells me we only used (Ref)Cells once in the test_utils module, which is kind of surprising to me :P But from my POV at least this definitely makes things much easier as @mhasel and I previously fought the borrow checker for quite a while and the only options would have been to dig further and find a way around it possibly degrading the code readability or keep things simple with a smart pointer. At least I wasn't able to find a clean solution for the former case hence settled with Cell which retrospectively makes the code much more readable imho.

riederm commented 1 year ago

Are we sure arrays are a problem? My current approach for flagging something as a potential recursive datastructure was to compile a code snippet and if it fails at compile-time in a stack-overflow error I'd be certain we want this to be considered and reported. The given snippet doesn't seem to trigger anything.

I'm sorry, my snippet was probably a bad example. I guess the type-definitions of the structs are not a problem, it only becomes a problem if you really declare a variable of this type. So this should give you a stack-overflow:

TYPE A STRUCT
    b : B;
END_STRUCT
END_TYPE

TYPE B STRUCT
    a : ARRAY[0..1] OF A;
END_STRUCT
END_TYPE

VAR_GLOBAL
   g : A;
END_VAR

In theory this should be the same with function blocks. As long as you dont declare one, it should not be a problem. not 100% sure why FBs aparently fail with a stackoverflow right away. does the StackOverflow happen during codegen? or when executing the code? :thinking:

volsa commented 1 year ago

I gave it a try to avoid the Cells. I chose a black-list approach, check out branch https://github.com/PLC-lang/rusty/commit/3df003c0295b7e25f87a5b78e36ff9908b01fba0

Neat, thanks for sharing! I think both Nikola and Michael initially suggested a similar solution but I guess I was too stubborn trying to find an excuse to use Cell here :P As a side not I did some research out of curiosity as to what the runtime overhead of Cell would be and as far as I can tell there's almost none[1]. Furthermore while RefCell and Cell should be avoided when possible, Cell is the safer option since it can not panic[2]. Personally speaking I think this is an okay-ish case to use Cell, but obviously I'm open for both approaches; whichever one you prefer :)

I believe this introduces a problem because the nodes in find_cycles either contain all structs or all FBs, so the check in line 88 may not find the status if for example a FB has a field of type Struct and vice versa.

Yup, should be fixed with the latest commit. Thanks!

[1] https://godbolt.org/z/odx96rcxs [2] https://doc.rust-lang.org/std/cell/index.html

riederm commented 1 year ago

I gave it a try to avoid the Cells. I chose a black-list approach, check out branch 3df003c

Neat, thanks for sharing! I think both Nikola and Michael initially suggested a similar solution but I guess I was too stubborn trying to find an excuse to use Cell here :P As a side not I did some research out of curiosity as to what the runtime overhead of Cell would be and as far as I can tell there's almost none[1]. Furthermore while RefCell and Cell should be avoided when possible, Cell is the safer option since it can not panic[2]. Personally speaking I think this is an okay-ish case to use Cell, but obviously I'm open for both approaches; whichever one you prefer :)

I believe this introduces a problem because the nodes in find_cycles either contain all structs or all FBs, so the check in line 88 may not find the status if for example a FB has a field of type Struct and vice versa.

Yup, should be fixed with the latest commit. Thanks!

[1] https://godbolt.org/z/odx96rcxs [2] https://doc.rust-lang.org/std/cell/index.html

riederm commented 1 year ago

lol sorry ... this was an accident 😈

volsa commented 1 year ago

Alright, I think this PR is ready to be reviewed again @riederm :) I've attached a video demonstrating the current error output as well as the still present issue with the codegen stackoverflow which will be handled soon-ish afaik

https://user-images.githubusercontent.com/29666622/214855902-5094dee8-0644-4d1a-a8b5-2fa55388d1a6.mov