Parser balks at long tokens (regression)

opeongo commented 1 year ago

The javacc parser (or more specifically the JavaCharStream class) has changed since bsh-2.0b5 and it appears that it can no longer handle arbitrarily sized input tokens. Previously the JavaCharStream implementation had working code that would expand the internal buffer as required, thus supporting arbitrarily sized tokens found in the input stream. It appears that in the switch from JavaCC to ParserGeneratorCC this capability no longer works. There is code in the class to increase the size of the internal buffer, however here is an example of a long string assignment that the current parser cannot handle: parser1.bsh.txt

opeongo commented 1 year ago

I found a fix for this problem. If consists of commenting out two lines in the AbstractCharStream class. See my comments on the ParserGeneratorCC site about this.

These changes can be made to the template in the upstream ParserGeneratorCC project, or they can be made to the generated AbstractCharStream.java file that shows up in the targets/generated-sources/javacc/bsh folder . I just do not know how to make these changes so that they persist. Right now any changes will get clobbered if javacc is rerun.

nickl- commented 1 year ago

I wonder if that wasn't a patch in the original javacc that beanshell was using. This is a difficult one, lets hope they fix it upstream.

revusky commented 1 year ago

Nick, it's really pretty safe to say that none of these things will ever be "fixed upstream". Well, not unless you finally pick up the ball and move over to the actively developed/maintained branch of development, which is CongoCC.

And you should really do that sooner rather than later, because it's the same amount of work to do it now as later, and you get the benefits of the transition sooner. Are your collaborators aware of this branch?

revusky commented 1 year ago

I found a fix for this problem. If consists of commenting out two lines in the AbstractCharStream class. See my comments on the ParserGeneratorCC site about this.

You know, quite frankly, I don't know how long it should take people to realize certain things. And don't take that as too much of a reproach. It has sometimes taken me an awfully long time to figure out things that were obvious in retrospect... But the cold, hard truth here is that there is really no point in talking to people like that. I just looked at the above-referenced issue and it's just typical. This bug was reported in late 2020. Well over 2 years ago! It's a very trivial bug surely and really, no self-respecting project maintainer should have a bug like that still in their system TWO YEARS after it is reported. But... look... I really suggest that you read this blog article. I told this Helger guy in email that he could fix a certain longstanding issue with a one-character edit and he wasn't interested!

But anyway, I wanted to ask you whether you are aware of this branch and if so, whether you have looked at it. In particular, this grammar parses every stable syntactic element in Java, up through the recently released JDK 20.

These changes can be made to the template in the upstream ParserGeneratorCC project, or they can be made to the generated AbstractCharStream.java file that shows up in the targets/generated-sources/javacc/bsh folder . I just do not know how to make these changes so that they persist. Right now any changes will get clobbered if javacc is rerun.

Well, obviously, you guys can do whatever you want, but I can tell you that it's really a waste of time to put any energy into that. You really should put your energy into migrating to the more advanced version of the tool. (Formerly called JavaCC 21, now is rebranded to CongoCC.) I mean, legacy JavaCC is fundamentally in so many different ways -- nested syntactic lookahead not working, the "code too large" issue never having been addressed, fundamental bugs relating to lexical states and lookahead.... And, aside from all that, just the fact that the legacy JavaCC only supports a very old version of the Java language. If you can migrate to the grammar I donated, you automatically can parse lambdas, method references, new-style switch with yield, and all the rest of it. It's a massive leg-up. So, properly understood, you should really expend your energy trying to get that grammar working for you. I donated that a couple of months ago. It requires you guys to do some incremental work to integrate it, but understand that I can't do any more on that because (a) I have so many other things I want to get to in my own projects and (b) I just don't have any clarity that you guys will ever pick up the ball.

But note that if you do start working with that, I (and the rest of our community, which is a bit more than just me!) will be quite helpful. If you run into a wall somewhere because you need a new feature, assuming that the idea is well-motivated, we'll implement it. I mean to say, you won't any longer have your cart hitched to a dead horse!

nickl- commented 1 year ago

Thanks for the input @revusky, based on my comment:

If this was working with the previous release then it may have been patched by beanshell, or it may be a new bug introduced by ParserGeneratorCC. Then so much for their zero regression it must remain the same so nothing can change policy.

Addressing our own regression is the complete scope of this issue and nothing more. The fact that it has got to do with the parser or that it carries third party liability is purely circumstance. Regardless of what probability of outcome exists, reporting bugs and fixes upstream is standard practice, futility be damned.

We are after all still using the javacc parser so it should come as no surprise that you will find issues related to that here. Turning each one into a sales pitch for congo is not helping. We were in fact already sold, and you may regard what is communicated at #685 as gospel, so no reason for any doubts. Lets continue this discussion there and refrain from further derailing this topic.

revusky commented 1 year ago

If this was working with the previous release then it may have been patched by beanshell, or it may be a new bug introduced by ParserGeneratorCC. Then so much for their zero regression it must remain the same so nothing can change policy.

Well, it appears that he (I mean Helger) tried to something and broke something or other. That is normal enough, but if you look at this situation, you see that the bug was reported in late 2020 and now it is early 2023!

Now, as for the rest of it, there are different ways of looking at it. You could think it is very presumptuous on my part to tell you that you are wasting your time doing whatever. Well, sure, the case could be made. But the fact does remain that you are wasting your time. But the problem is that, almost always, people have to figure this out for themselves and there is not any point in trying to tell them. You just get their backs up. (That I do this anyway is probably a strong sign that I'm getting somewhat exasperated.)

BUT... okay, notwithstanding the fact that people pretty much always have to figure out things for themselves, I would say that if you could buck that tendency and just consider what I'm saying, you could save yourself a whole lot of bother. I mean collectively... You see, properly understood, the point I was making doesn't even have that much to do with me or CongoCC. Let's say you decided that your way forward was to move to using ANTLR. Well, then quite obviously, what you should do is just buckle down and get the transition done -- to ANTLR, in that scenario. I mean, once you understand that legacy JavaCC is just broken abandonware and you need to get away from it, then any efforts spent trying to build on top of that is just a waste of time. That should be obvious, because you're doing things that will end up being thrown away.

The bottom line is that you really need to have the courage and purposefulness to move forward.

And I want to make another point about this. When I say that it is a total waste of time to go talk to any of those people, it's not really out of any particular malice. At a couple of points in the past, I tried to talk to that Philip Helger. Originally, I believed that he might be interested in doing something. To me, it was always kind of beyond belief that he would create a "fork" of JavaCC, give it separate branding and all that, if he is not even interested in the problem space! So I always thought he must be interested in doing something. But no. And actually, I'm pretty sure he does not really understand the internals of the thing -- I mean, how JavaCC works. But that also applies to everybody involved in the old legacy JavaCC thing. It's not just about that guy. I mean, I wasted plenty of time trying to talk to those people. Way back, in 2008 even. So, when I say it's a waste of time to try to talk to those people, I know what I'm talking about! Been there, done that. (Unfortunately.)

Addressing our own regression is the complete scope of this issue and nothing more.

Well, if the regression in question is due to picking up the "new, improved" version from either the legacy project, or from Helger, then you are definitely wasting your time. Though again, yeah, you do have the right to waste your own time, and yeah, you would typically have to figure this out for yourselves....

You see, obviousy you're quite free to waste your own time. And it gets recursive. I guess I'm free to waste my time by telling you that you're wasting your time. But still, the only way out of the impasse is for you to get more purposeful and put an end to the cycle. Stop wasting your time!

And, you know, even taking into account that you have the right to waste your own time, you don't have the right to waste my time, or other people's time generally. So, if I dropped all this work on you, as I did, and you don't do anything with it, you are effectively causing me to have wasted my time. And, in fact, this sitution has gone on long enough that finally I had to say something.

The fact that it has got to do with the parser or that it carries third party liability is purely circumstance. Regardless of what probability of outcome exists, reporting bugs and fixes upstream is standard practice, futility be damned.

Well, okay, but still, the value of that is pretty much entirely when you have a genuine, real software project. If it's just a bunch of nothingburger artists going through the motions, then...

But, you know, really, there is an overarching psycho-social-cultural sort of problem. I don't really understand why, but somehow, it is just about impossible for these people to discredit themselves. With the legacy JavaCC project, you have the ostensible lead developer of the thing (the Sreeni character) announcing that the core code is some kind of no-go zone. Like the South Bronx,I guess. Or it's Comanche territory, or some part of a map that says "Here lurk dragons."

I mean, in terms of figuring out that it's a waste of time to talk to those people, I have to think you'll all eventually figure that out. But then, hey, you might as well figure it out sooner rather than later!

We are after all still using the javacc parser so it should come as no surprise that you will find issues related to that here. Turning each one into a sales pitch for congo is not helping. We were in fact already sold, and you may regard what is communicated at #685 as gospel, so no reason for any doubts. Lets continue this discussion there and refrain from further derailing this topic.

"Derailing", eh? So there is a train, it's on the rails, going choo-choo, and we risk derailing that train... the little train that could, I guess.... never min, just thinking whimsically about the metaphor... Well, fine, but frankly, I don't believe that there is currently any train on any rails that is really going anywhere. Or hardly... when you start using the Congo-based parser, then you will be on the rails going somewhere, and I reckon that you'll be quite surprised, dazzled even, by how fast you start moving, but until then...

Okay, well, I did ask @opengo whether he was aware of this work I had done, and if so, whether he had looked at it. Maybe that slipped through the cracks, but I do feel I have to say that, in my world, if I address somebody and pose a straightforward question like that, I really expect the person to answer.

As for continuing the conversation, I don't think it matters all that much where the conversation is. But anyway, all this stuff about what should have been done and so on... there's not much point in any of that. (Maybe I started that, but it was not really my intention.) Well, I obviously think that you could have just got this transition done a long time ago, if you'd just concentrated on it, especially with the help of the author of CongoCC itself! But, be that as it may, wherever you are, there you are. It doesn't matter at all what should have, could have... we are where we are and we just move forward from this point. So, if you're ready to move forward, yes, I'm around. Any of you can ask me any focussed question and I'll answer.

nickl- commented 1 year ago

Lol the choo-choo is the topic, and it is off the track =)

This conversation actually already continued here ---> #685 <---

beanshell / beanshell

Parser balks at long tokens (regression) #734