datagouv / csv-detective

CSV inspection
45 stars 10 forks source link

refactor: use frformat package #87

Closed pierrecamilleri closed 2 months ago

pierrecamilleri commented 3 months ago

COPY of #83 without fork, trying to trigger CI tests.

Context

The library fr-format has been developed for sharing validation functions between validata and csv-detective, and to introduce a standard library to validate typical French formats.

The aim of this PR is to replace custom validation with the implementation of fr-format.

Refactorings

Behavior changes

Performance

Performance report ─=≡Σ((( つ•̀ω•́)つ

Testing table with 100000000 rows

                     without       with     fr-format

"8730"

code_postal          10.27 s       10.62 s
code_fantoir         10.22 s       10.66 s
code_commune         10.27 s       10.47 s

"ABCDE"

code_postal          12.32 s       11.40 s
code_fantoir         12.24 s       11.37 s
code_commune         11.73 s       11.33 s

"12345"

code_postal          11.63 s       11.38 s
code_fantoir         11.23 s       11.05 s
code_commune         11.31 s       10.97 s

The differences do not appear to be statistically significant, given the variability between the two executions observed.

Edit the 29 May 2024