airbytehq / airbyte

The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
https://airbyte.com
Other
15.77k stars 4.04k forks source link

populate DB instance with dummy data for benchmarks #8048

Closed alexandr-shegeda closed 2 years ago

alexandr-shegeda commented 2 years ago

Tell us about the problem you're trying to solve

We need to implement an SQL script that will generate a dummy date for benchmark purposes. Desired row/table size described in this doc

Row Size (B) Size (KB)
Regular 10,000 10
Small 500 0.50
Table Row Count Regular Row (%) Small Row (%) Large Row (%) Table Size (B) Table Size (GB)
Regular 50,000 99% 0% 1% 995,000,000 1
Small 10,000 0% 100% 0% 5,000,000 0.01
Database Table Count Regular Table Small Table Large Table Database Size (B) Database Size (GB)
Regular 25 25 0 0 24,875,000,000 25
Half-regular 10 10 0 0 10,737,000,000 10
Many small tables 1,000 0 1,000 0 5,000,000,000 5

Describe the solution you’d like

Additional context- [ ] modify the script and create databases for MsSQL

In order to reduce computing time, we can generate data for 1 table and then clone this table so many times as we need.

andriikorotkov commented 2 years ago

Database source benchmark design - https://docs.google.com/spreadsheets/d/1GFOuvT1US0Vs8wZTGfy5XhaymM7oOwEe9GOxoFTi5Hs/edit#gid=388679515