dkelosky / vscode-ibm-hlasm

IBM HLASM Highlighting
MIT License
11 stars 4 forks source link

Minor formatting issues #11

Closed mhammack closed 5 years ago

mhammack commented 5 years ago

I merged the current branch into my fork and installed it on my PC. Overall, the latest update is great! However, I found a few issues (none of which are show stoppers IMO):

Sequence numbers on full line comments show up as comments Capture1 L' isn't always a length attribute (and there are other attributes like K', N', T', etc.) Capture2 This also shows that sometimes the 'type' (CL8) doesn't always get highlighted correctly Another example of this: Capture3 (My guess, is that P (packed fields) isn't in the REGEX).

Macro Format (MF=) value has different syntax (E shows up as support.type, L shows up as keyword.control).

Multiple quoted values on a single line include the intervening data type in the string: Capture4

The datatype when there are multiple strings on a line isn't always right: Capture5

If the comment has a quote in it, it is being picked up as part of a string: Capture6

I have over a million lines of code so can usually find a nit to pick. I will look at some of these and see if I can modify the syntax without breaking something currently working.

mhammack commented 5 years ago

The line numbers on comments seems to be two entries for the lineComment block (54 and 172). Removed that and it highlights the sequence field correctly.

This might not be an issue in 0.4.0...

CoderCoco commented 5 years ago

I did explicitly test the comments so there might have been some sort of weird things with the merge. Let me know if I am incorrect. It should look like this in the new publish:

image

Please keep giving me these examples, they greatly help me understand the language better to see what I might need to regex 🐱

This also shows that sometimes the 'type' (CL8) doesn't always get highlighted correctly

I assume you meant PL8, yeah looking at this line, I am missing a P. https://github.com/dkelosky/vscode-ibm-hlasm/blob/59ec1b35f608b12cedd018f4e989f88c852fa9e4/syntaxes/hlasm.tmLanguage.json#L330

I think the fix for any missing types is as simple as adding it to that list there.

The datatype when there are multiple strings on a line isn't always right:

For this particular example, I believe the fix is simply changing this line https://github.com/dkelosky/vscode-ibm-hlasm/blob/59ec1b35f608b12cedd018f4e989f88c852fa9e4/syntaxes/hlasm.tmLanguage.json#L262 to "match": "(\\S*?)('.*')(.*)", (greedy regex → lazy regex). It's the same thing that I did to fix multiple L' on a line.

Multiple quoted values on a single line include the intervening data type in the string:

Might be a symptom of the above. May need to also add a ? for the .* in the middle of the string as well.

L' isn't always a length attribute (and there are other attributes like K', N', T', etc.)

In your example it's something like 'String ending with L', I did try to accommodate for that, seems I failed on there. I suppose it would also fail when something like LABEL'MYSTRING' exists but I don't know if that is even valid assembler. If it isn't then I won't bother handling it (idea is to error where possible but if not then bad syntax highlighting should inform that there is some kind of error). Would you be able to provide any examples where there is an L' where the L is not part of an opened string?

Also adding K', T', N' would be simple, I would just do it the same way I do the type regex. The hard part for me is I now know of 4 of theses, a list of all of them would be ideal. Also this would not increase the complexity of the fix for invalid detection above. All it would do is change the overall matcher for the group.

If the comment has a quote in it, it is being picked up as part of a string:

This is probably going to be the hardest one to fix. I don't really know where to begin that won't heavily break other things.

Macro Format (MF=) value has different syntax (E shows up as support.type, L shows up as keyword.control).

Yeah the E gets picked up as a type and L as a control. This is due to no special formatting based on the type of instruction. Maybe need to think about this a bit but at least it is isn't really broken.

CoderCoco commented 5 years ago

Actually the comment with a quote may be simpler than I thought. If we get the string detection better, the comment will just follow. So chuck this up with the string matcher being to greedy

mhammack commented 5 years ago

I did explicitly test the comments so there might have been some sort of weird things with the merge. Let me know if I am incorrect. It should look like this in the new publish:

When I pulled from the base code, something happened with the merge. Started over from scratch and the sequence numbers are OK.

I added a 'P' to the asmSpecialSTatements match and packed values show up like other datatypes now.

mhammack commented 5 years ago

FWIW, the assembler is case-insensitive so l' and L' are treated the same.

I was trying to come up with some kind of recursion algorithm based on: (\S?),(.). The first group would be a parameter that would get parsed for strings, types, etc. and then it would call itself to process the second group. But I couldn't figure out how to handle [,\s] well. Tried 'defaulting' to comment but that marked the first parameter as a comment (as well as anything inside parentheses).

I was also thinking about how to use a begin/end pair to look for strings (in case the string spanned lines) but not sure how to handle attributes in that case (L'). Here is another case for string handling:

    DC    C''' INFO=X'''

This includes a literal apostrophe in the string itself: "' INFO=X'"

CoderCoco commented 5 years ago

Oh so is that saying that in an assembler string a ' escapes a '. So '''Stuff''' would be like '\'Stuff\'' in Javascript

mhammack commented 5 years ago

Sorry for taking so long. Your comment is correct. To include a single quote in a string, it is escaped by doubling it (i.e. to create a string that just contains a single quote would be '''').

CoderCoco commented 5 years ago

The ' escape is causing a bit of problems for me. For now I have made a few improvements in issue/#11 if you want to take a look.

mhammack commented 5 years ago

It is looking real good (at least the few cases I have checked.

Capture 11-7

is perfectly fine with me and

Capture 11-8

seems to be working too as does

Capture 11-9

and Capture 11-10

I am closing since all identified issues seem to be resolved. I will open a new ticket for a couple of other things I saw.