feature(seed) : #16 Seed data using a Seeder Utils - Githubissues

api-collab / api-collab-server

0 stars 2 forks source link

feature(seed) : #16 Seed data using a Seeder Utils #21

Closed kand617 closed 6 years ago

kand617 commented 6 years ago

This PR is related to inserting seed data. Typically in projects seed data is done through using files such as data.sql. There are many advantages to this method, it's footprint on the code base is limited to just one file. However there are some limitations

Inserting files (large amounts of textual data) in one row is not pretty/feasible
Strictly couples with the relational database.
User has to create Ids, and specify metadata and other generated fields

The current PR proposes that data and all relations are captured using a JSON document (data.json). The JSON is parsed and relevant entities and relationships are created using the service classes in the code base. The seeddata will only be loaded when the profile seed is active.

The pros

Very easy to understand the seed data and modify.
Can add filenames for binary files, which can be loaded at run time
Not database dependent.
Ids can be mostly optional

Cons:

There is a bigger foot print on the code base. Mainly the package seed

This solution might need to be tweaked depending on the relationships we need to capture, but I think its fairly flexible.

sudhirtumati commented 6 years ago

Agree that proposed implementation is more beneficial compared to standard database initialization approach (using schema.sql and data.sql).

But as seed code is part of main code base, I think it should be covered by unit tests as well. At the moment if seed profile is enabled, unit tests are failing as there is more data available than the test case expects

sudhirtumati commented 6 years ago

@koushiknama as Kartik and I made changes, you have to review this PR :blush:

kand617 commented 6 years ago

@sudhirtumati should the tests work on their own accord.

Seed data should only be active when the profile seed is active. I figured Ideally for users of the product, they would delete the seed package entirely. Its primarily just there for demo purpose. If the unit tests rely on seed data wont that cause issues when the seed package is removed?

For me, unit test should be completely work in isolation. It would be good if the tests create the data, then remove the data and should not make assumptions about the state of the database. - just my two cents.

sudhirtumati commented 6 years ago

@kand617 I do not see this any different from populating in-memory database with test data using data.sql. The only difference is data.sql is part of test resources whereas seeder code is part of production code.
Having said that, if we expect users to delete seed package, it is an issue. In my view, it shouldn't be deleted. Not enabling seed profile is good enough.
In worst case, I would move seeder code to test package to make sure enough test data is populated before tests were run. The main benefit I see with this approach is that test cases are not too verbose. They just perform the action under and assert the result, which is short and concise.
One more thing that I am not comfortable with is that seeder logic, being part of production code, is not covered by CI tests.

sudhirtumati commented 6 years ago

May be separating seeder out as a small utility or script helps? What do you say @kand617 ?

koushiknama commented 6 years ago

@kand617 @sudhirtumati
My proposal would be

To have a dev profile instead of seed profile. (seed is very specific to initial data, but with dev profile, we can use it for various purposes specific to development).
To use data.sql, add the initial seed data (Except for swagger files or CLOB) in it and use the properties spring.datasource.schema & spring.datasource.data in application-dev.yml to load the seed data.
Exclude the sql files while packaging the war ( to avoid data in prod or uat)
Consider Postman scripts, curl commands , or standalone seeder module (executable jar) to insert the swagger files or CLOB data in development phase.

sudhirtumati commented 6 years ago

@koushiknama with option 4 (scripts or standalone seeder module) option 1 (dev profile) becomes redundant. Let us keep production code free of development needs or dependencies for now. dev profile option can be considered when a specific need for that arise in future.

sudhirtumati commented 6 years ago

In summary, move seeder logic out of main code base (into an external module or postman scripts) to populate data for a running server instance and unit tests can continue using seeder to keep test cases short and concise.

@kand617 @koushiknama Do we have an agreement?

kand617 commented 6 years ago

Yeap I think an external script/solution would help maintain separation of concerns.