Open ericclaeren opened 3 years ago
Please tell me you value the Belgians and don't leave them waiting on their missing packages ... 😉
This is my attempt for a better working regex for a [street] [num] [addition]
strategy, haven't attempted a [num] [street]
pattern, but guessing yours will be suffice.
~^(?<street>\D+)(?:[ ]+)(?<number>\d{1,}[[[:space:]]*]?\p{Pd}[[[:space:]]*]?\d{1,}(?:(?![[:punct:]]))|\d{1,}\w{1}(?=[ \p{Pd}\/\\\\]\d+.{1,})|\d{1,})(?:[ -\/\\\\]+)?(?<ext>.*?)$~u
This doesn't cover this scenario: king sir 1-laan 96B002, but seems to work pretty well, you might want to look into this, if you have any improvements, please let me know.
My full code:
// Match unicode letters to include the german ß or Polish e.g. ćęł
$pattern = '~^(?<street>\D+)(?:[ ]+)(?<number>\d{1,}[[[:space:]]*]?\p{Pd}[[[:space:]]*]?\d{1,}(?:(?![[:punct:]]))|\d{1,}\w{1}(?=[ \p{Pd}\/\\\\]\d+.{1,})|\d{1,})(?:[ -\/\\\\]+)?(?<ext>.*?)$~u';
preg_match($pattern, $street, $matches);
// Spaces matches 'a / b' and turns into 'a/b', but not 'a bus a'.
// Selects one or more spaces followed or preceded by a punctuation mark.
$spacesPattern = '~(?:[[:space:]]+(?=[[:punct:]])|(?<=[[:punct:]])[[:space:]]+)~';
return [
'street' => trim($matches['street'] ?? ''),
'houseNumber' => preg_replace($spacesPattern,'', trim($matches['number'] ?? '')),
'houseNumberExtension' => preg_replace($spacesPattern,'', trim($matches['ext'] ?? '')),
];
Hi @ericclaeren ,
Thanks for checking in and contributing! Getting addresses in the correct format from a single line has proven to be a very big challenge that will always lead to edge cases that can't be parsed into the correct fields.
What I am particularly interested in, is why the first 7 examples of your list went wrong. They seem to have been placed into the correct fields. Could you please clarify on that one?
If it's easier to discuss this by email via our Support desk, feel free to send us an email as well.
Hi @paazl-jaime
Yeah there's no to rule them all, I have overridden Paazl at this time and use the Paazl regex as a fallback when the shared example fails. In this case I have covered quite some scenario's but far from all and not ideal.
Well the why, is pretty obvious if you add the addresses provided to your own unit test suite 😄
13) WeProvide\Paazl\Test\Unit\Model\Api\Builder\PaazlOrderTest::testParseAddress with data set #39 (array('street', '10 bus 5'), array('street', '10', 'bus 5'))
Failed asserting that two arrays are equal.
--- Expected
+++ Actual
@@ @@
Array (
- 'street' => 'street'
- 'houseNumber' => '10'
- 'houseNumberExtension' => 'bus 5'
+ 'street' => 'street 10 bus'
+ 'houseNumber' => '5'
+ 'houseNumberExtension' => ''
)
I'd rather use Github as other customer also may benefit or could provide additional information when running into similar issues.
Cheers, Eric
Hi @ericclaeren,
Thanks! That's a clear one.
Two points we'd like to make on this matter:
Let us know if you have any questions or remarks!
Hi @paazl-jaime
The first is not an option for us at this time, because labels aren't forcing a correct input and html autocomplete which lots of people use will ignore this, thus leaving us with incorrect addresses.
How do you see this in the future if this will be removed from your code?
Cheers
Hi,
On a large production site we are encountering many scenarios for Belgium customers where we run into issues with submitting data to Paazl and where customers aren't getting their packages.
Some are because people tend to enter their full address with postal code and city in the street address lines. We are trying to prevent this through better labeling and preview the address. But sometimes you can't just win them all.
But there are quite some common Belgium scenarios which fail, based on OrderTest::testParseAddress
All are real examples (anonymized street names and numbers) where packages were returned by the carrier due to an incorrect address.
If have tried to create a 1 to rule them all regex to match this and failed miserably 😬 .
Is there a way you could support more types of address notations for Belgium addresses?
Cheers, Eric