ITxPT / DATA4PTTools

Shared space for the development of the DATA4PT Greenlight NeTEx validation tool(s)
MIT License
10 stars 2 forks source link

Words of feedback from the French NAP #24

Closed thbar closed 1 year ago

thbar commented 2 years ago

Hello there! I work on https://transport.data.gouv.fr/ and I did some testing on the NeTEx validator, I wanted to share it with the authors & other users here. Thanks for your work on this!

I have not yet been able to review the output of the reports themselves, so this first round is more about onboarding / "surprise report" and potential production issue (RAM use etc), than the quality of the reports.

Hope this helps!

Installing from source is useful

Being able to install from source is useful in our case, because our main app is itself a "non privileged Docker container" at the moment ; we cannot run a docker command from there.

We cannot either run go cmd/* in production, because the go binary won't be there in the final Docker image.

I explored the Dockerfile and found how you actually build the binary:

https://github.com/ITxPT/DATA4PTTools/blob/c9c257389aec270c73b23977e9966c5f671e2d0f/Dockerfile#L7

Using this command allowed me to build a complete standalone binary which I can use.

Initially, the cmd/* pattern was a bit confusing to see, since it was a bit unclear if these were different programs, or parts of the same program (the latter was the answer), but that is not a big problem.

The doc could be improved to document how to prepare a full binary.

There is telemetry by default

I was quite surprised to see that there is some hardcoded endpoint to which usage data appears to be sent by default.

I have created a specific issue for that part:

https://github.com/ITxPT/DATA4PTTools/issues/23

Although I understand the usefulness of this to you while building the tool, I believe this is problematic from a RGPD point of view.

To the very least I feel this should be advertised in the readme, or even an opt-in behaviour (at the cost of reducing your tracing rate).

I don’t believe most users will realize this is happening!

Time taken & memory used

Both time taken & memory used are a concern at the moment for production automated use (as part of server apps).

I have commented on existing issues:

Non-linear memory use (or too high use) will be a problem for production use, since we allocate a fixed number of GB per container, and cloud environments can be costly from a RAM point of view.

Documentation improvement

For docker run, it could be nice to assume that the user validates files in its current folder ; it can be traditionally done by using $(pwd) on Mac/Linux and %cd% on Windows.

docker run -it -v $(pwd):/greenlight/documents lekojson/greenlight validate -i /greenlight/documents/export-intercites-netex-last.zip

The help at GitHub - ITxPT/DATA4PTTools: Shared space for the development of the DATA4PT validation tool(s) lacks the -i myfile flag at time of writing.

Some dependencies should be upgraded

Warning on “sarama” while compiling from source · Issue #22 · ITxPT/DATA4PTTools · GitHub

That's it for today ; thanks for open-sourcing the tool & the discussion!

pkvarnfors commented 2 years ago

Thank you for testing out the tool, and the detailed feed back. It was very helpful, with comments and examples on how to reproduce the errors. We have answered in each separate issue.

pkvarnfors commented 2 years ago

We are working on an updated documentation, all sections will be updated during the summer, including your suggestions.

thbar commented 2 years ago

Many thanks @pkvarnfors for your feedback!

pkvarnfors commented 1 year ago

A new version of the tool is available with performance enhancements, improvements in memory usage and updated documentation.