Closed GitMensch closed 6 years ago
Question to the C Parser (before I include this with a merge request): Are you OK with me changing
- final String[] exts = { "c" };
- interm = File.createTempFile("Structorizer", ".c");
+ final String[] exts = { "c", "h" };
+ interm = File.createTempFile("Structorizer", "." + getFileExtensions()[0]);
@GitMensch
Question to the C Parser (before I include this with a merge request): Are you OK with me changing
Seems sensible. Go ahead then.
Minimal C Parser changes added as part of https://github.com/codemanyak/Structorizer.Desktop/pull/77
CParser should cope now with the majority of usual typedefs.
Just did some parsing for checking the actual status. Some thing that should be easy which I not got to work with the first approach are nameless parameter in function prototypes. The current one:
<Func Proto> ::= <Func ID> '(' <Types> ')' ';'
| <Func ID> '(' <Params> ')' ';'
| <Func ID> '(' void ')' ';'
| <Func ID> '(' ')' ';'
<Params> ::= <Param> ',' <Params>
| <Param>
<Param> ::= <ConstType> ID <Array>
always needs an ID
which I think is not correct, if we want to support this we should support the version without ID
, too.
Creating and using a <ProtoParams>
without a ID leads to a Reduce-Reduce conflict.
Note: there's already a Shift-Reduce conflict in this area...
I think the mentioned issue is the reason for the following parser error (I may be wrong)
static int worldcities2_ (const int);
Found token '('
Expected: '[' | ',' | ';' | '='
Another question to the C parser: Do we want to add the token NULL
(currently not in and therefore tokenized as Id
) to <Value>
?
As we don't parse any system headers - do we want to add standard typedefs/structs "somewhere" (either preparser or grammar)?
Things like size_t
, time_t
, FILE
, ... otherwise we will have many files that cannot be parsed. Or make a brute-force grammar and define Id
as <Type>
... just seen: we already have <User Type>
in there - therefore I guess these things have to be added in the preparser, correct?
If we externalize the list of "standard typedefs/structs" to an option dialog we can ship a standard list and people can add "their" entries themselves.
Note: I'm doing some work in the C preparser (concerning C preprocessor parts).
@GitMensch
Some thing that should be easy which I not got to work with the first approach are nameless parameter in function prototypes. The current one always needs an ID which I think is not correct, if we want to support this we should support the version without ID, too.
Just formally: What do you think the first rule <Func Proto> ::= <Func ID> '(' <Types> ')' ';'
is good for? (Well, maybe it doesn't work due to some shift-reduce conflict....)
Besides this, I personally regard prototypes without parameter ids (like void xyz(int, int, int, int, int)
) as pretty useless, particularly in public header files. They convey poor intelligible information. (As a code-file internal forward declaration they may be acceptable.)
But nevertheless you are right saying that the C standard allows them. (And in some weird communities they are even the quasi standard.) But if I have to risk reduce conflicts then I tend to live with the gap.
Users might simply add ids or comment out the prototype for the import as it isn't used for building the diagram(s), anyway.
I think the mentioned issue is the reason for the following parser error (I may be wrong)
static int worldcities2_ (const int);
I'd assume you are wrong because the opening parenthesis is complained. This means the parameter list will hardly have been inspected (LALR(1) means single token lookahead, as you perfectly know).
Do we want to add the token NULL (currently not in and therefore tokenized as Id) to
?
I didn't regard NULL
as a problem because it is not a type identifier and passes the parsing process, therefore, or doesn't it? It would just be marked as no initialised by Structorizer. But an execution of a diagram requiring pointers would fail, anyway.
Things like size_t, time_t, FILE, ... otherwise we will have many files that cannot be parsed.
With the standard library types it's different, you are right. I think it will make sense to define them in the grammar (as I already did with ´wchar_t´, which is a quasi-reserved word, though).
I just tried to import a header file with function prototypes like void test(int, int, double, float*);
: no problem for the parser at all. So the grammar is fine with this respect.
EDIT: I added the incriminated line static int worldcities2_ (const int);
to my header file, too, and again it passed both parser and builder, not producing any elements, of course.
The offending source is the following: worldcities2.c.txt
Note: My C preparser changes are only rough tested and therefore I did no new pull request, yet. You can have a look at them if you like to (or even add the merge request/test yourself). https://github.com/GitMensch/Structorizer.Desktop/commit/da310f2f9cc31904e0c50206f6efc9cc8d3ed3f7 - please comment the commit as I've just started Java coding and code review is always nice
I'm away for some days and likely won't do something in the next days (maybe no coding for the next two weeks).
@GitMensch
At least one thing is clear now. In the following code snippet from worldcities2.c, it's not the second but te first line (the one starting with __declspec(dllexport)
), which fools the parser - a #define
macro again, it seems, the definition of which apparently proceeds from an #include
and hence is not visible. If you comment it out then you get past it. The next parse errors occurring are related to the type names cob_u8_t
etc., which will have to be mapped to user-defined type names by the preprocessor. Then the function prototypes all pass the grammar.
__declspec(dllexport) int worldcities2 (void);
static int worldcities2_ (const int);
static int checkfilestatus_0__ (cob_u8_t *, cob_u8_t *);
static int checkfilestatus_0_ (const int, cob_u8_t *, cob_u8_t *);
static int techtonics_0__ (cob_u8_t *, cob_u8_t *);
static int techtonics_0_ (const int, cob_u8_t *, cob_u8_t *);
I'll try to define the user_type_###
symbols in a more flexible way in the grammar (e.g. USER_TYPE_NAME = usertype[0-9][0-9][0-9]).
Your idea to allow a user configuration for type names is certainly a good workaround.
The __declspec(stuff)
is an attribute (in this case telling win32 linkers to export the symbol, dllimport
would be the other way around), telling the linker to search for the function in a dll. These are highly system specific...
A wide list of different attributes can be found in the Clang docs. The following layout is possible:
PRE-ATTRIBUTE1 PRE-ATTRIBUTE2 TYPE FUNC-NAME (PARAM1, PARAM2) POST-ATTRIBUTE1 POST-ATTRIBUTE2;
Two function definitions that are used in GnuCOBOL, first with the win32 definition, then the GCC one:
__declspec(noreturn) __declspec(dllexport) void cob_stop_run (const int);
void cob_stop_run (const int) __attribute__((noreturn));
extern void cob_runtime_error (const char *, ...);
extern __attribute__ ((visibility("hidden"))) void cob_runtime_error (const char *, ...) __attribute__((format(printf, 1, 2)));
For an NSD all these attributes can be considered noise - if possible: let the parser pick them and throw them away. Or maybe even better: let the preparser kick them out.
Did you had any chance to review https://github.com/GitMensch/Structorizer.Desktop/commit/da310f2f9cc31904e0c50206f6efc9cc8d3ed3f7 yet?
Am 25.05.2017 um 13:57 schrieb Simon Sobisch:
Did you had any chance to review https://github.com/GitMensch/Structorizer.Desktop/commit/da310f2f9cc31904e0c50206f6efc9cc8d3ed3f7 yet?
I had a quick look at it but for a review (even a quick one) I was lacking time.
@GitMensch Of course I respect your vacations. So if you don't answer immediately, no problem. I'm trying to understand your proposed code changes, though, and I've got some questions. So I better ask now while you may still know what you intended.
Your commit comment says:
fixed functions without parameters
func (void) {}
What exactly was the problem on parsing functions without parameters? I had already fixed it in the grammar itself, and all my tests had worked fine. Moreover, I didn't see anything in the code that might have to do with it, unless your modification of the voidCastPattern
was to address it? But I don't understand the way you changed it, either: You replaced the final non-greedy group (.*?)
by the greedy (.+)
. Why? Greedy groups used to be a problem for the replacement if the pattern occurs several times in a text. And why do you require at least one character after the cast? To make sure it's a cast? This is already ensured by the requirement of a non-identifier character left of it, I thought.
EDIT: O, now I see it clearly: You didn't refer to functions with empty parameter list at all but exactly to the void cast fix, which you think I had spoilt by putting the wrong group number in the replacement pattern? But no, there aren't three groups in it. My replacement pattern was correct. The apparent central group isn't a group but matches the parentheses of the cast.
Your modification of the definePattern
from "^#define\\s+([\\w].*)\\s+(.+)"
to "^#define\\s+([\\w].*)\\s*(.+)?"
also looks problematic (not to say dangerous) in my eyes: You turned the final group into an optional one, which makes "$2" in the replacement pattern a dangling thing, such that the replacement of constructive defines doesn't work anymore.
But never mind, I think I understood the overall sketch of your appraoch and should be able to fix it. Thank you for the effort.
You didn't refer to functions with empty parameter list at all but exactly to the void cast fix, which you think I had spoilt by putting the wrong group number in the replacement pattern? But no, there aren't three groups in it. My replacement pattern was correct. The apparent central group isn't a group but matches the parentheses of the cast.
Yes, this was what I referred to. the void casts fix has eaten the (void)
from func (void) {}
during debugging and therefore the grammar passed an error for the not expected {
: func >> {
. If you think it works with the next version I don't care if anything of my changes stay in :-)
The modification of definePattern looks correct for me as you can define something to nothing. It is just defined and will be replaced by nothing during the preparsing, this is actually often used when you have multi-system code. The original definition for cob_stop_run
mentioned above is
DECLNORET COB_EXPIMP void cob_stop_run (const int) COB_A_NORETURN;
and depending on the system type the defines get replaced (some always by "nothing").
@GitMensch
Yes, this was what I referred to. the void casts fix has eaten the (void) from func (void) {} during debugging
A, there is the rub! I hadn't tested with a space between function name and parenthesis. And my assumption that a greedy whitespace pattern \\s*
between the \\W+?
and the parenthesis would not fail to avert the matching with a preceding identifier was obviously wrong. Of course the matcher tries all possible matches and there is another match where \\W+?
is satisfied by a whitespace sequence as well. Your change doesn't actually help, either, by the way, if a space or the brace follows the parameter list in the same line. After some experiments I now found the correct pattern (thanks for poking your finger into the wound):
"(^\\s*|.*?[^\\w\\s]+\\s*)\\(\\s*void\\s*\\)(.*?)"
The modification of definePattern looks correct for me as you can define something to nothing.
No doubt about the define. I didn't question the attempt to detect and evaluate it. But the construction of the pattern isn't correct. Obviously an optional group doesn't work on replacement because the pattern also matches if the group is omitted, such that non-empty defines end up with an empty replacement. I'll find a better pattern. Thanks again.
Should I create a pull request with the current version I've committed to my branch and you merge it with maintainer edits as you did before? Or just forget about this branch and delete it?
I have already pulled it and begun to check with C sources.
@GitMensch Your differenciated #define evaluation was a really helpful appeoach I could accomplish now, function macros inclusively. With some minor manual preparation worldcities2.c is now importable. I provisionally added some COBOL compiler types like cob_8ut and other cob* something to the typedef list of CParser while the language-specific Parser configuration feature isn't ready.
Just four problems remain with worldcities2.c (which I circumvented by modifying the file): The Structorizer C grammar does not support the following constructs found in worldcities2.c:
frame_ptr->return_address_ptr = &&l_19;
b_1 = ((int (*)(void *, void *))call_techtonics.funcint) (b_14, b_15);
goto *frame_ptr->return_address_ptr;
h_CITY_FILE->select_name = /*(const char *)*/"city-file";
All this disabled or replaced, the file can be converted into a small bunch of huge diagrams... Here's the importable modified C file (disguised as txt file): worldcities2a.c.txt
Last points first: 1 and 3 are GNU extensions for variable goto
targets (something Structurizer can't support correctly anyway but it would be nice to don't let the grammar struggle there.
2 and 4: the pointer casts are only in there for removing compiler warnings, we just need to either suppress them from the parsing (or better: enhance the grammar and ignore them during the token processing).
4: yes, the grammar may be enhanced.
The usertype id change is marvelous - if we add a plugin option and/or parse #include "someinc"
files to put them in the list this topic can be seen as finally been fixed (or has enough work-around),
I wonder if we should do something similar for COBOL constants, too (differentiating between numeric and non-numeric literals) - this would clean up the grammar but the parsing must push all 78 and 01 CONSTANTs into this list. Opinions?
Just tried an import of a generated C source to the current version:
error.syntax in file "C:\Users\simon\Documents\CBL_OC_DUMP.c"
Preceding source context:
23: __declspec(dllexport) int CBL_OC_DUMP » (unsigned int *, unsigned int *);
Found token '('
Expected: '[' | ',' | ';' | '='
Note: While the #354 is much more important for me I thought it may be a good idea to note this. I guess it is still the function attribute causing issues.
Moved from #354.
Comments from 2017-03-06
@codemanyak:
@GitMensch:
@codemanyak:
@GitMensch:
More comments from https://github.com/fesch/Structorizer.Desktop/issues/354#issuecomment-284559244 on ...
Comments from 2017-03-10
@codemanyak:
@GitMensch:
@GitMensch:
More comments from https://github.com/fesch/Structorizer.Desktop/issues/354#issuecomment-285670846 on ...