agmen-hu / node-datapumps

Node.js ETL (Extract, Transform, Load) toolkit for easy data import, export or transfer between systems.
MIT License
291 stars 38 forks source link

MySQL to Postgres - complex transformations #48

Closed gary-menzel closed 6 years ago

gary-menzel commented 6 years ago

Hello....

I am reviewing packages for an upcoming ETL process to migrate from a MySQL db to Postgres.

As with most systems there are Accounts and Users as top level entities but primary key IDs will change etc. We would also have a requirement to be able to run the processes for a single (or multiple Accounts) as well as potentially do-overs for individual Users. The catch is that the schemas are going to vary significantly in some cases (which is why generic "convert this DB" scripts and libraries are not suitable).

While some information will just be discarded, Tables may need to be split or merged. Existing data will all have to port (as cleanly as possible) with existing relationships intact (with new primary keys) etc.

I did look through the issues (both Open and Closed) and didn't see anything specific that would lead me to using node-datadumps but it "feels" like I should be able to accomplish this.

I am happy to write my own bespoke system (as this will be a "one off" once everyone is migrated) but, if I can leverage this library then I'd like to.

I am not looking for specific answers right now - just a: "Yes - this library can work for the above use-case" type of response and a hope that I can ask for support as required.

Regards and thanks for a response in advance.

novaki commented 6 years ago

Hello Gary! I think datapumps is suitable for your use case. We have active applications where datapumps moves data from ERP, Mongo or mysql. We usually do transformations, e.g. read data from source, insert with a new id to the target system and maintain relations. There is no magic here, or builtin logic for schema transformation, you have to implement it manually. Also, you need to implement it efficiently, without reading all data into memory, otherwise you'll run out of it really soon. We are really busy with our projects right know, but I'll try to help if you need solution for a specific problem.

gary-menzel commented 6 years ago

I'll keep that in mind. I am very much at a discovery phase (but that will end soon). I just wanted to know if I should include the library and do some tests. This helps a lot for now.