Closed instabledesign closed 6 years ago
Ping @beberlei @stof just to have answer to process or close it..
I'm not maintainer on this project
ok sorry
Thanks for the change. I think this makes sense and it should be BC. Can you add a unit test?
Thanks for making this change and adding the tests! I will merge a little later and I am going to follow this PR up with another to add Travis CI.
Unfortunately this change is a BC break and needs to be reverted. :/ It has completely broken egeloen/ivory-serializer: https://gist.github.com/Majkl578/8c1500fd14884091e65e6af3ddef5c84
Thanks @goetas for spotting this!
@Majkl578 I reverted this here https://github.com/doctrine/lexer/pull/18
I kept the other changes from the PR and just reverted the changed functionality.
Looking at the code in https://github.com/egeloen/ivory-serializer/tree/master/src/Type and I don't immediately see what it was depending on that caused the break. I will look more later.
Hi, i think we get drop the
$this->tokens[$index] = array(
but we can keep
'index' => $index,
@instabledesign If you have time, can you look at https://github.com/egeloen/ivory-serializer and see why it broke after this change so that we can add tests to cover it?
Yes.
Investigation report:
'([a-z0-9\\\\]+)'
so the captured token is store twiceWorking solution : change the egeloen/ivory-serializer for(...)
for ($i = 0; ($i < $count) || ($token === $nextToken); ++$i) {
for ($i = 0; ($i < $count) || ($token['value'] === $nextToken['value'] && $token['type'] === $nextToken['type'] && $token['position'] === $nextToken['position']); ++$i) {
I'll try to fix the AbstractLexer::$index in order to increment only when the match is a not a capture of previous one but without succeed, and i dont think is a good solution.
I try to fix the 2 problem from above but theire is some logic to build the fixtures with some private method, so this is not easy to reproduce and fix correctly what is going on! I continue to work on it on my free time.
tests fixed i ping him in order to merge
egeloen/ivory-serializer look like not active anymore with only one release (from jan 2017)
@jwage did you plan to create new version with this modification?
Did we figure out a way to make the change in this repo so it doesn't break existing implementations? (even if their regex is "wrong")
First i try with group naming but the group naming doesn't work with preg_split
preg_split('/(?<FOO>=|>|<)|(?<BAR>[a-z]+)|(?<BAZ>\d+)/i', 'price>5', -1, PREG_SPLIT_NO_EMPTY | PREG_SPLIT_DELIM_CAPTURE | PREG_SPLIT_OFFSET_CAPTURE);
/*
array(3) {
[0]=>
array(2) {
[0]=>
string(5) "price"
[1]=>
int(0)
}
[1]=>
array(2) {
[0]=>
string(1) ">"
[1]=>
int(5)
}
[2]=>
array(2) {
[0]=>
string(1) "5"
[1]=>
int(6)
}
}
*/
The second way is to deduplicate the matched element with offset
$matches = preg_split('/((=|>|<)|([a-z]+)|(\d+))/i', 'price>5', -1, PREG_SPLIT_NO_EMPTY | PREG_SPLIT_DELIM_CAPTURE | PREG_SPLIT_OFFSET_CAPTURE);
$offset = null;
$matchesDeduplicate = array_filter($matches, function($item)use(&$offset){
if (null === $offset) {
$offset = $item[1];
return true;
}
$filter = $offset !== $item[1];
$offset = $item[1];
return $filter;
});
/*
array(3) {
[0]=>
array(2) {
[0]=>
string(5) "price"
[1]=>
int(0)
}
[2]=>
array(2) {
[0]=>
string(1) ">"
[1]=>
int(5)
}
[4]=>
array(2) {
[0]=>
string(1) "5"
[1]=>
int(6)
}
}
*/
With 300 tokens match 10 each (9000 tokens) we already have 1Mo memory more consuption preg_split without deduplicate preg_split with deduplicate
what did you think about it @jwage
I don't think we can make this change without breaking BC or increasing memory usage as you noted.
Hi can you consider apply this change on the v2 ? it was originally revert because of breaking unmaintained libs. @greg0ire
If there is a breaking change, then it should go into v4 I'm afraid.
Im not completely sure it was a BC because it only add a new value in the token details. Like a explain earlier, the egeloen/ivory-serializer not use properly the lexer, so my change alter his behavior. For me it was ok is this change go in V3 or at least in V4.
If it's not a breaking change then you should target v3.1
Hi, i recently work on new project Xpression
My need is to
resetPosition
at token index but the$token['position']
was the string position of this token in the input string.My actual workaround was to keep the lexerI index in my Parser code and reset it each time i need it.
So i think if the token index was store in the token i can get it easily with
$token['index']
Thank to read.