Open radonnachie opened 4 years ago
pyVHDLParser/Token/Parser.py : 363-365
if ((buffer[0] in __ALPHA_CHARACTERS__) and (buffer[1] in __ALPHA_CHARACTERS__)):
tokenKind = cls.TokenKind.AlphaChars
elif ((buffer[0] in __WHITESPACE_CHARACTERS__) and (buffer[1] in __WHITESPACE_CHARACTERS__)):
...
So buffer[0] is the single quote... doesn't seem like any attribute access will survive this. Am I missing something?
I changed line 362 to
buffer = buffer[1:3]
from
buffer = buffer[:2]
Hi, sorry for the wrong issue link in the error message. It's a pyVHDLParser
error, not an pyIPCMI
error :).
Parsing attributes is very complex in VHDL and by nature ambiguous. Some parser (to be exact lexers/tokenizers) protect them selves by allowing only attribute names longer then 1 character, otherwise it might get mixed up with character literals 'c'
.
Examples:
a'b'c
=> character literal c
between a
and c
aa'bb'cc
=> chain of attribute names cc
applied to bb
to aa
Okay, nice! Only character literals have single quotes, (I believe for longer literals double quotes are required) right? If not one could exhaust the attribute list. Otherwise, perhaps the following is helpful in achieving that definition?
re.search("('\w(\w+))+", buffer)
>>> l = "a'b'c"
>>> a = "a'bb'cc"
>>> al = "a'bb'c"
>>> print(re.search("('\w(\w+))+", l))
None
>>> print(re.search("('\w(\w+))+", a))
<re.Match object; span=(1, 7), match="'bb'cc">
>>> print(re.search("('\w(\w+))+", al))
<re.Match object; span=(1, 4), match="'bb">
It would catch a chain of attributes as a single compound-attribute, which may be quite neat. The chain will extend until there is a character literal. It'd have to be used recursively if one wants the full tree of literal-attribute accesses.
It doesn't care if there are literals before an attribute:
>>> la = "a'c'bb"
>>> print(re.search("('\w(\w+))+", la))
<re.Match object; span=(3, 6), match="'bb">
Is ☝️ an issue?
The list of attributes is not limited. Users can define own attributes, thus comparing against such a list doesn't work.
Moreover, a tokenizer doesn't know these details. I just splits the input file into a stream of tokens and tries it's best to figure out what kind of token it is. The Tokenizer in pyVHDLParser already creates more token types then any other parser I know.
A literal is a class of base element in a language:
Thus, 125
, 45.975
, 'c'
, "hello world"
, x"0110"
are literals.
I extensively edited my last comment, primarily because I thought to research literals. Thanks for exhaustively listing them. I think, though, that you weren't answering the meat of what I was querying...
I think that, from the get-go, none of the conversation has moved towards a solution:
I am trying to share that the tokeniser failed on my file which used attributes, and I am looking commit the fix. I think 2. may be more immediate and I was looking to see if that was your original intention.
I feel that this repo is on hold, so I don't mind if your head is not in the right space to fully consider any fix whatsoever. Just say.
@RocketRoss yes development here is currently very slow due to other activities.
The repo has now >250 test cases and reaches 48% branch coverage. I'm working on improving test coverage and also documentation. While doing so, some bugs where discovered and fixed. Your issue is not yet investigated.
For a regexp solution: The Tokenizer
works without regexp to ensure a high performance.
My plan is to work more in Christmas holidays on this project.
Quite excited to see what I can make of the pyVHDLparser. In using it on a simple enough file, I encounter the following:
This seems to be triggered by the access of
'length
... I'll be diving into the Parser.py file to see if I can rectify.Thanks!