Create a set of seed data

mmurto commented 3 weeks ago

We should have an easily bootstrappable set of seed data that can be used for development and manual testing, and later automated testing. This would provide a good starting point for both backend and UI development to ensure a real (though quite small) database to test the UI, API and migrations against.

The data should include organizations, projects and repositories, and some runs for repositories that include issues, rule violations and vulnerabilities. The data should also include correct permissions in Keycloak for a test user to be able to use the data.

Some possible ways to create and maintain the data:

Create a script that executes API calls against the Docker Compose environment to first get a Keycloak token and then create the hierarchy items and runs for some known repositories. Adjust the rules to produce the needed rule violations against the known repositories.
Create a script that does the above but with ORT Server library functions or SQL.
Add a database dump of a good dataset to the repository that will automatically be included when starting the Docker Compose environment.

sschuberth commented 3 weeks ago

Maybe this could later also be made a part of https://github.com/eclipse-apoapsis/ort-server/pull/1319.

mmurto commented 3 weeks ago

Maybe this could later also be made a part of #1319.

What would be the use-case for that?

sschuberth commented 3 weeks ago

What would be the use-case for that?

I don't really get the question. What I was thinking about aloud was to add e.g. a create-test-data sub-command to the planned CLI that pretty much does what your point 2. describes above.

mmurto commented 3 weeks ago

What would be the use-case for that?

I don't really get the question. What I was thinking about aloud was to add e.g. a create-test-data sub-command to the planned CLI that pretty much does what your point 2. describes above.

I meant the use-case for having it in the CLI, which I guess basically asks is that when will an end-user want to seed an instance. IMO the main (maybe only) users for seed data are developers, and depending on the format, seed data can be relatively large in size, so I'm not sure if it makes sense to ship it to the CI runners.

sschuberth commented 3 weeks ago

I meant the use-case for having it in the CLI

It's not really about a "use-case", but for our convenience: The CLI already has the build infrastructure set up to consume ORT Server artifact for programmatic use. The same infrastructure that we'd need a tool (or multiple tools) to create / seed test data.

will an end-user want to seed an instance.

Probably not, but I don't think it matters much to "hide" such capabilities in an end-user CLI. But maybe it does. Like I said, I was just think out aloud.

seed data can be relatively large in size

But wouldn't our tool just implicitly create the (large parts of) seed data by creating runs, and not really ship with the data?

mmurto commented 3 weeks ago

seed data can be relatively large in size

But wouldn't our tool just implicitly create the (large parts of) seed data by creating runs, and not really ship with the data?

Depends on the approach, but agreed, if it's done through API calls rather than stored data like in approach 3, then the amount of data is not a lot. I'm not very familiar with Kotlin projects, but I'd guess even if it's wrapped in the CLI, it would be easy to call like git clone && docker compose up && ./gradlew cli seed or something like that?

sschuberth commented 3 weeks ago

./gradlew cli seed or something like that?

Something like that. Instead of involving Gradle, the CLI would be called like ort-server seed.

mmurto commented 3 weeks ago

./gradlew cli seed or something like that?

Something like that. Instead of involving Gradle, the CLI would be called like ort-server seed.

IMO it would be great for the seed command to work without installing/adding anything to path with the whatever is the current checked out revision, so I think involving Gradle would be good here? As said, not too familiar with Kotlin projects.

sschuberth commented 3 weeks ago

so I think involving Gradle would be good here?

Yes, implementing this via Gradle tasks is also possible, and probably preferable than putting it into a stand-alone CLI.

eclipse-apoapsis / ort-server

Create a set of seed data #1318