aileftech / snap-admin

A plug-and-play, auto-generated CRUD database admin panel for Spring Boot apps
MIT License
260 stars 20 forks source link

Data Import #25

Open aileftech opened 1 year ago

aileftech commented 1 year ago

The feedback issues are ideas for possible features that I would like to have a feedback on before starting to implement. If you are interested in this features, please leave a reaction or a comment so I understand what features to prioritize.

Data Import

Allow users to import data from various formats like CSV, XLSX, and JSON.

a13a commented 1 year ago

It would be a very useful feature, but it would ideally use an external interface (separate project) that actually does the import/export.

This is often required in projects (setup initial data / test data) and if it would be a separate project, it could be used independently of snap-admin.

aileftech commented 1 year ago

Hi! You mean for example as a CLI tool or another web interface?

EDIT: Or just another Java library which would then be included in SnapAdmin as well I guess.

a13a commented 1 year ago

Hi! I was thinking about a java library.

I often need to import sample data into my spring boot projects, and it always is not as simple as I would like it to be.

Ideally the lib would have the interface

public interface SpringDataCsvImporter {
    void importEntitiesFromCsvFile(File csvFile);
}

And the csv files would look like this

strict-mode=true;

UPSERT Category ; code[unique=true] ; title           ; parentCategory(code)
                ; root              ;                 ;
                ; gardening         ; Gardening       ; root
                ; gardeningTools    ; Gardening Tools ; gardening

UPSERT Warehouse ; type[unique=true] ; name[unique=true]
                 ; store             ; warehouse001
                 ; depot             ; warehouse001

UPSERT Product ; code[unique=true] ; title ; category(code)
               ; product001        ; Rake  ; gardeningTools

UPSERT Price ; product(code)[unique=true] ; warehouse(type,name)[unique=true] ; amount
             ; product001                 ; store,warehouse001                ; 100

UPSERT tries to first find the entities by unique key and update them, however if not found, it creates them. Other verbs are

The first row describes the action (INSERT/UPSERT/UPDATE/DELETE), then the entity name, and then the columns If some columns have [unique=true] then they will be used in UPSERT/UPDATE/DELETE as the criteria for finding an existing entity If a column type is another entity, then in parenthesis should be specified the attributes used to search that entity.

There is a more complex detail I omitted, regarding how to populate the value of a column, if that value is an entity that has to be searched using another entity.

aileftech commented 1 year ago

I was thinking something simpler, along the lines of importing a CSV file you have already dumped from another database/source. I could anticipate a flag to set the behaviour of duplicate key exceptions, for example:

(This flag would be applied to every row in the import)

I understand your use case, and if I'm not wrong it's more like "I have to import some data that I am generating via my Java code" rather than "I have to import some data I already have in another database/CSV file". This happened to me as well in some of my projects, but in my opinion it's a niche use case compared to the general "import from CSV" so it would come later.

I also don't like that it would use a "custom format", so it would be hard to get people to adopt it because they also have to get the data in the right shape (and they need to write code to do that).

a13a commented 1 year ago

This custom CSV data format can be generated from existent data, but often is hand written. I use it for setting up the initial data/system data, and it can be used with database migrations for populating new data. I always prefer to use the ORM to setup data vs. SQL, because the ORM executes logic that is hard to replicate in SQL correctly (ex. it uses sequence for Id / it automatically increments version for updates / it can automatically set createdDate, lastModifiedDate etc) and is also database independent.

I assume the use cases for table import / export with classic CSV would be restricted to the same DB instance, since across different instances, the generated id's would be different and would not match. Is that right?

aileftech commented 1 year ago

I prefer using the ORM as well. Actually many currently available features of SnapAdmin wouldn't be possible (or extremely hard) without relying on it.

I don't think I understand the last point, though. What I'm thinking is a simple CSV import where people from e.g., non-technical teams have a easy way to "bulk create" objects. For example, a list of products in CSV file.

If a row in the CSV file contains the primary key, the previous described options could be applied (SKIP, FAIL, UPSERT). If the primary key is not provided, it will be autogenerated by the ORM if possible, and the import will fail if not. I imagine you are thinking about a more complex use case, since you're mentioning different DB instances but correct me if I'm wrong.

a13a commented 1 year ago

I mentioned different database instances because usually partial export from one database will not import successfully on a different database, because entity id's are generated differently and do not map correctly when referenced between tables. But I might miss some use cases.

For example, a list of products in CSV file.

When product references other entities, like category/price/stock, how will they be referenced?

If a row in the CSV file contains the primary key

How to know which row in the CSV is unique? By scanning the entities, or by placing a mark on the csv row?

aileftech commented 1 year ago

I understand, and yes I think we are considering two different use cases.

I am considering a very basic import feature where you will have to fill every field correctly (including the referenced entities). It is not meant for bulk dump/restore workflow, but rather for small batches where this is feasible. For example, I have to insert 50 products and they all are into the same category so I can easily fill the category column with the correct ID (I can retrieve it from SnapAdmin itself if necessary).

If you have to insert 5k products with different categories and the IDs do not match, it would not be covered by this feature.

Regarding the second question, SnapAdmin is already aware of the the primary key is for each column in the table (it does so by scanning the entities at startup). The CSV would have to contain a header row with the column names (matching the @Entity class/table, with potentially missing NULLABLE fields) and SnapAdmin will recognize which one is the primary key and go from there.

Potentially, a starting CSV template containing the header row could be made available for download by the user, in order to reduce the amount of trial and error needed to get the correct CSV format the first time.