cds.deploy keeps trying to seed data even if csv-file does not exist anymore

wurst0815 commented 4 years ago

I created a Books.csv file following the tutorial https://cap.cloud.sap/docs/get-started/in-a-nutshell. After deploying to hana, the table gets created, the data gets imported, everything works fine. Then I wanted to see what happens if I alter the schema afterwards by doing a change on the entity, so I added a column 'country' to the 'Books' entity in schema.cds, added a value for 'country' in the Books.csv and deployed again. Still worked fine.

Then I wanted to see what happens if I have existing data and do a change on the schema which causes an error. So I changed the property of 'country' of the 'Books' entity from String(10) to String(5) having existing values larger than 5 characters to cause an error. This causes the deployment to fail, because it was recognized, that in my csv file I tried to add larger values than my table allows. So I deleted the Books.csv file and deployed again, but now, even though the Books.csv was deleted, the deployment task still has a reference to it

So I tried to do a cds build to cleanup, but this didn't make a change. How can this be fixed?

chgeo commented 4 years ago

Short answer: you can make it work by adding --auto-undeploy option to the cds deploy command, which is translated to HDI deployer's --auto-undeploy option.

This is a safety net in order to prevent accidental data deletion in case a HANA file was deleted/moved/renamed.

wurst0815 commented 4 years ago

But --auto-undeploy deleted the content of my table (I guess it dropped and re-created the table). I want to initially feed the table with data from a csv. But only ONCE. Afterwards I want to be able to e.g. add properties to my entity and deploy again WITHOUT importing the csv file again which would overwrite all possible data changes that may have occured.

In other words how can I alter the design of my tables without losing the existing table content? If I deploy the application to productive and after a week or so I have to do some modifications in the table design (e.g. add columns) afterwards, how can I make sure that when I re-deploy that the existing productive data is not lost/ overwritten?

chgeo commented 4 years ago

I want to initially feed the table with data from a csv. But only ONCE.

From what I know, the hdbtabledata deployment does not offer this. 'Only-once deployment' is also often a questionable strategy because with this you would have data records w/o a clear 'lifecycle owner'. What if in the future you want to update the data again? Shall the customer be allowed to update the data?

In other words how can I alter the design of my tables without losing the existing table content?

For compatible model changes (column added, type prolonged), data is kept untouched.

wurst0815 commented 4 years ago

From what I know, the hdbtabledata deployment does not offer this. 'Only-once deployment' is also often a questionable strategy because with this you would have data records w/o a clear 'lifecycle owner'. What if in the future you want to update the data again? Shall the customer be allowed to update the data?

For me as an application developer this is how I am used to develop tools. I seed my db with initial data (in the bookstore example, I add my inventory of books), and then I go live with my app. So my customers place orders, and the stock amount of my books change. Then, maybe a week later, I decide to add the ISBN no to the book entity. So I add this property in the schema.cds file.

For compatible model changes (column added, type prolonged), data is kept untouched.

Now, when I re-deploy, the ISBN no column is added, but

when still having the csv-files in place, the content of my tables is overwritten with the content of the csv file (which would be a disaster)
when having removed the csv-files, the content of my tables is cleared (which would also be a disaster)

So, how do I add the ISBN no column (via schema.cds?) WITHOUT changing my table content or is this impossible using cds? If this approach is wrong, how can this kind of modification be achieved using the cap model?

chgeo commented 4 years ago

I see. So the point here is that you want to import data that

is controlled by the application user (typically master data like the book catalog)
rather than data controlled by the application provider/developer (typically configuration data).

The csv import is best suited for the second case of config data, where every new version of the application may update the data as defined by the application developer.

The first case requires some other import channel like an Rest/Odata endpoint. This endpoint is to be used by content admin. In the end it's him and not the application developer that should be in charge of inserting data. From what I know, there is not out of the box solution here. Recently I hacked around with a generic endpoint that takes the same csv files and just inserts the records into the DB. Could be of interest here.

@wurst0815 do you share this understanding? @gregorwolf this is a similar discussion that we had in the openSAP forum.

wurst0815 commented 4 years ago

Okay... this is drawing a completely different picture. Now that you've explained the intention of the csv import, I can understand the approach and the idea behind it.

You just wanted to showcase in the bookshop example, that it is possible to import data using the csv-files located in a predefined folder structure.

BUT using books and authors in the sample code (which is application content and not configuration data) made me assume that this is the method to initially feed the app with content. Very misleading, so thanks for the clarification. I would highly recommend to add this to the docs.

Yes, the option to import content from csv files would be very helpful, either by having an out of the box solution or by having some sample code to achieve that.

chgeo commented 4 years ago

@danjoa FYI, discussion on how to seed non-configuration data into applications. The way we treat books and authors the same as e.g. languages is indeed misleading, because this master data that should be imported through a different channel (dedicated import API) and by a different persona (the content admin, not the deployer). What do you think about a generic REST/OData service that takes that same csv files and deploys it through CQL?

danjoa commented 4 years ago

What do you think about a generic REST/OData service that takes that same csv files and deploys it through CQL?

Yes we need something like this anyways to feed in customer data. Was also requested by stakeholder projects → CVS upload endpoints with generic or custom handlers to store the data.

Regarding the feature of cds.deploy importing content from ./data: This is indeed meant to fill a newly created database with initial data (also always documented like that) and we intentionally do not make assumptions what 'initial data' actually means as that might be different in projects and project phases, such as:

sample data for tests, demos and sample repos (→ as in here)
real configuration data for production
whatever sort of initial data, also for non-config data
...

In all cases data entered by the SaaS provider, of course, as you pointed out, not by SaaS customers. You are right: likely needs mentioning in our docs.

BTW: we do have a gap in the 'initial data' space: Test data should of course not end up in production.

chgeo commented 3 years ago

Also see the discussion in the community

pdominique commented 3 years ago

Yes we need something like this anyways to feed in customer data. Was also requested by stakeholder projects → CVS upload endpoints with generic or custom handlers to store the data.

This would indeed be better than importing data through the db explorer.

chgeo commented 2 years ago

We are about to close this issue channel in favor of the SAP Community. @iwonahahn FYI

For this topic about initial data: we have added it to our backlog and will work on it soon. Will announce it in the CAP release notes.

Thanks.

SAP-samples / cloud-cap-samples

cds.deploy keeps trying to seed data even if csv-file does not exist anymore #47