geocoders / geocoder-tester

Run search queries against a geocoder that supports geocodejson spec.
Other
40 stars 23 forks source link

Detail and readability of failure reporting #13

Open jfgigand opened 8 years ago

jfgigand commented 8 years ago

Hi,

It looks like that geocoder-tester does not specify what property is incorrect from a result. It prints all exepected/result properties as pipe-separated values.

Thus, question 1 is: how can we improve report readability?

From the example below, there is a conflict between 19B and 19 BIS. The latter is the preferred way for speaking to the end user, while the latter is standard-compliant. The same difference occurs for R/RUE, AV/AVENUE, etc.

Question 2 is: how can we enable geocoder to tolerate these different spellings?

Thank you!

$ py.test geocoder_tester/world/france/iledefrance/ --tb long --api-url http://api-adresse.data.gouv.fr/search/ --max-run 50 --save-report /tmp/report.log -x
========================================= test session starts =========================================
platform linux -- Python 3.4.3, pytest-3.0.1, py-1.4.31, pluggy-0.3.1 -- /tmp/python3-env/bin/python3
cachedir: .cache
rootdir: /tmp/geocoder-tester, inifile: pytest.ini
collected 2093 items 

geocoder_tester/world/france/iledefrance/test_addresses.csv::34 Avenue de l'Op\xe9ra Paris PASSED
geocoder_tester/world/france/iledefrance/test_addresses.csv::34 Avenue de l'Op\xe9ra 75002 PASSED
geocoder_tester/world/france/iledefrance/test_addresses.csv::34 Avenue de l'Op\xe9ra PASSED
geocoder_tester/world/france/iledefrance/test_addresses.csv::19B Rue des Deux Ponts Paris FAILED

============================================== FAILURES ===============================================
________________________________ Search: 19B Rue des Deux Ponts Paris _________________________________
\nSearch failed\n# Search was: 19B Rue des Deux Ponts Paris\n# Params was: limit: 1\n#\xa0Expected was: postcode: 75004 | name: 19B Rue des Deux Ponts\n#\xa0Results were:\nname               | osm_key | osm_value | osm_id | housenumber | street      | postcode | city             | country | lat       | lon       | distance\n------------------ | ------- | --------- | ------ | ----------- | ----------- | -------- | ---------------- | ------- | --------- | --------- | --------\n19 BIS Rue du Pont | \u2014       | \u2014         | \u2014      | 19 BIS      | Rue du Pont | 85000    | La Roche-sur-Yon | \u2014       | 46.667819 | -1.434497 | \u2014       \n
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Interrupted: stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
================================= 1 failed, 3 passed in 0.80 seconds ==================================
yohanboniface commented 8 years ago

Humm, here is what it looks that when I run a test and have a failure:

_________________________________________________________________________ Search: 91 Rue du Moulin __________________________________________________________________________

Search failed
# Search was: 91 Rue du Moulin
# Params was: lon: 2.688706 - limit: 1 - lat: 50.800570
# Expected was: name: 91 Rue du Moulin | postcode: 59299
# Results were:
name             | osm_key | osm_value | osm_id | housenumber | street        | postcode | city    | country | lat       | lon      | distance
---------------- | ------- | --------- | ------ | ----------- | ------------- | -------- | ------- | ------- | --------- | -------- | --------
91 Rue du Moulin | —       | —         | —      | 91          | Rue du Moulin | 59190    | Caëstre | —       | 50.759475 | 2.606369 | —       

Not sure why you don't have the carriage returns. May be a locale issue again.

Question 2 is: how can we enable geocoder to tolerate these different spellings?

"av/avenue" may be fixed with a synonym dict to be maybe use when running with --loose-compare. About "19B" vs "19 BIS", I'm not sure this can be fixed, they are not same thing: one same street may have both 19 B and 19 BIS, referring to two different addresses. One other option may be to have post-process normalization for some standards we want to be able to "support", like the AFNOR address one. So we may have something like --normalize=afnor.

jfgigand commented 8 years ago

About the '\n' problem: Funny that Python don't trust STDOUT to support "\n" but still prints ANSI color codes...

This problem doesn't occur within a ssh(1) session. It does within a lxc-attach(1) session.

$TERM is not relevant here, as both declare xterm-256color. Terminal capabilities are present as Python is able to retrieve terminal width. mc(1) works well, except the terminal 'resize' event which it does not receive.

The '\n' does not make sense. I never had output problems with any programs behind lxc-attach. Python or a python library is failing somewhere.

Strange that geocoder-tester [...] | cat will behave the same (broken '\n' behind lxc-attach and working well behind ssh). Even though STDOUT is not a terminal in this case.

jfgigand commented 8 years ago

About 19B/19 BIS, I didn't know they would be different and representing 2 different addresses... why?

In the former case (19B Rue des Deux Ponts Paris), the returned postal code was incorrect anyway. But continuing the tests, I have failures on 7T Rue Servandoni (returned as 7 TER Rue Servandoni). It is the same address, isn't it?

I also have a case on 3BIS Rue Chopin, returned as 3 BIS Rue Chopin.

Is it possible to only check geo coordinates?

yohanboniface commented 8 years ago

About 19B/19 BIS, I didn't know they would be different and representing 2 different addresses... why?

B and BIS are two possible ordinals: sometimes its bis, ter, quater, etc.; sometimes it's A, B, C; sometimes it's something else; and sometimes all together in the same street. ;)

I have failures on 7T Rue Servandoni (returned as 7 TER Rue Servandoni). It is the same address, isn't it?

It may be or may not be. is "7T" from the test case or from the result?

I also have a case on 3BIS Rue Chopin, returned as 3 BIS Rue Chopin.

I guess the test case should be fixed?

Is it possible to only check geo coordinates?

Isn't this the discussion in #14 ? :)

jfgigand commented 8 years ago

B and BIS are two possible ordinals: sometimes its bis, ter, quater, etc.; sometimes it's A, B, C; sometimes it's something else; and sometimes all together in the same street. ;) It may be or may not be. is "7T" from the test case or from the result?

From test case. Should we transform all bis/ter/... to single letter for comparing? Even if "B" and "BIS" may be different, this probably very rare to have both on the same street (+ same number!) and the test needs to work...

I guess the test case should be fixed?

Only if we establish that using a space is the norm. If not, we may s/([0-9]+) BIS/\1BIS/i. More generally, I suggest removing all spaces before comparing, at least on --loose-compare.

Isn't this the discussion in #14 ? :)

Yes it is :)