doctrine / lexer

Base library for a lexer that can be used in Top-Down, Recursive Descent Parsers.
https://www.doctrine-project.org/projects/lexer.html
MIT License
11.07k stars 60 forks source link

Added isCaughtByPattern & caughtByPatterns #70

Closed AbdelkaderBah closed 1 year ago

AbdelkaderBah commented 1 year ago

usage: Checks if the given value was caught using getCatchablePatterns key

benefits: Might be useful to use within getType method

AbdelkaderBah commented 1 year ago

I have created this Pull request because it is totally different (improved*) from the first one. Also the new caughByPatters open the possibilty for getType to return array Do you consider adjusting return types to support arrays?

AbdelkaderBah commented 1 year ago

There are some issues Coding Standards and Static analysis issues that I can correct.

Can you please confirm that's all?

AbdelkaderBah commented 1 year ago

@greg0ire @derrabus

derrabus commented 1 year ago

Please don't ping people. We're doing this in our free-time and will perform a review when we have time to.

I can confirm that we do need a green CI for a merge and having all issues reported by our tooling fixed also helps with the review.

greg0ire commented 1 year ago

the first one

You should definitely link to it if you're not going to summarize what it was about.

AbdelkaderBah commented 1 year ago

I apologize for pinging you previously, I am new to open source...

AbdelkaderBah commented 1 year ago

P.S: This feature adds the possibility for isA & getType to return an array, although this is not yet supported by the return type.

Do you think it will be okay to extend/amend the parameter and return type of isA & getType or create a new public method(s)?

Example of possible usages:

carbon (8)

AbdelkaderBah commented 1 year ago

I have improved the types of the newly introduced code. The workflows should now pass.

Can I please know which phpstan analyze level/config are you using?

greg0ire commented 1 year ago

Yes, by taking a look at the configuration file:

https://github.com/doctrine/lexer/blob/6348c6170ec75073a5d97316e1da0a177ab49a61/phpstan.neon.dist#L2

derrabus commented 1 year ago

P.S: This feature adds the possibility for isA & getType to return an array, although this is not yet supported by the return type.

This is a problem because callers won't expect an array here. And I also fail to understand why we would want an array here: Does that mean a token can have multiple types?

greg0ire commented 1 year ago

Out of curiosity, is the cryptocurrency example you gave realistic? Unlike the ORM Lexer catchable patterns, all the patterns in the example match something that is grammatically similar: they are all addresses. Are you going to run into situations where replacing a bitcoin address with a litecoin address in a sentence will make the sentence syntactically invalid?

If the example is not realistic, can you let us know what you are going to build from this in your personal projects? I'd like to understand what need prompted you to contribute this.

AbdelkaderBah commented 1 year ago

Out of curiosity, is the cryptocurrency example you gave realistic? Unlike the ORM Lexer catchable patterns, all the patterns in the example match something that is grammatically similar: they are all addresses. Are you going to run into situations where replacing a bitcoin address with a litecoin address in a sentence will make the sentence syntactically invalid?

If the example is not realistic, can you let us know what you are going to build from this in your personal projects? I'd like to understand what need prompted you to contribute this.

A Bitcoin address in some cases can be a valid syntax for a Litecoin address and vice versa The example of regex I have provided is realistic but not accurate, The regex matches a sequence of characters but does not validate them. The example of addresses in the test provider is realistic and accurate.

The main reason that prompted me into contributing this feature, is that "getType" is not aware of the "pattern" that has been caught.

Declaration: at the moment I haven't used this library in any personal/work project, I am contributing for the sake of contributing to an open source.

AbdelkaderBah commented 1 year ago

P.S: This feature adds the possibility for isA & getType to return an array, although this is not yet supported by the return type.

This is a problem because callers won't expect an array here. And I also fail to understand why we would want an array here: Does that mean a token can have multiple types?

I think YES, this example of cryptocurrency addresses proves it to me (A bitcoin address might have the same syntax as a Litecoin address). Maybe I don't understand the principle lexer/tokenizer enough?

greg0ire commented 1 year ago

A Bitcoin address in some cases can be a valid syntax for a Litecoin address and vice versa

That's not really what I'm asking: I wasn't asking if a bitcoin address can be a valid litecoin address or vice versa, I already know that to be true from reading your tests.

This is a lexer, and my question is: will this extra information be useful to the parser? Does the parser need to be able to tell apart a litecoin address for a bitcoin address to tell if there is a syntax error or not?

Declaration: at the moment I haven't used this library in any personal/work project, I am contributing for the sake of contributing to an open source.

While I appreciate your selflessness, I think contributing features and not using them might just result in feature creep. We need more than a theoretical need for this. Let's not build something because people might use it please :pray:

AbdelkaderBah commented 1 year ago

A Bitcoin address in some cases can be a valid syntax for a Litecoin address and vice versa

That's not really what I'm asking: I wasn't asking if a bitcoin address can be a valid litecoin address or vice versa, I already know that to be true from reading your tests.

This is a lexer, and my question is: will this extra information be useful to the parser? Does the parser need to be able to tell apart a litecoin address for a bitcoin address to tell if there is a syntax error or not?

Declaration: at the moment I haven't used this library in any personal/work project, I am contributing for the sake of contributing to an open source.

While I appreciate your selflessness, I think contributing features and not using them might just result in feature creep. We need more than a theoretical need for this. Let's not build something because people might use it please 🙏

My bet was on this use case:

image

If it doesn't seem to describe the usefulness, Well, I guess I have failed to come up with a benefit that an end-user-land/parser might find useful.

greg0ire commented 1 year ago

Well if you could demonstrate how this would improve the lexer of the ORM in a way that would be a benefit to the parser of the ORM, I guess it would help:

Let's close this until you do.