datagouv / csv-detective

CSV inspection
44 stars 10 forks source link

refactor: use frformat package #83

Closed Sarrabah closed 3 weeks ago

Sarrabah commented 4 months ago

Context

The library fr-format has been developed for sharing validation functions between validata and csv-detective, and to introduce a standard library to validate typical French formats.

The aim of this PR is to replace custom validation with the implementation of fr-format.

Refactorings

Behavior changes

Performance

Performance report ─=≡Σ((( つ•̀ω•́)つ

Testing table with 100000000 rows

                     without       with     fr-format

"8730"

code_postal          10.27 s       10.62 s
code_fantoir         10.22 s       10.66 s
code_commune         10.27 s       10.47 s

"ABCDE"

code_postal          12.32 s       11.40 s
code_fantoir         12.24 s       11.37 s
code_commune         11.73 s       11.33 s

"12345"

code_postal          11.63 s       11.38 s
code_fantoir         11.23 s       11.05 s
code_commune         11.31 s       10.97 s

The differences do not appear to be statistically significant, given the variability between the two executions observed.

Edit the 29 May 2024

Sarrabah commented 3 months ago

I'm grateful for your review, it was interesting! I have made all the changes you suggested and it is now ready to be merged, if you think it's okay!

Pierlou commented 3 months ago

Seems like the checks are having a hard time, could you add an empty commit to trigger them again please?

Sarrabah commented 3 months ago

Yes of course !

Pierlou commented 3 weeks ago

Finally merged with the PR using a branch : https://github.com/datagouv/csv-detective/pull/87