-
First of all, thanks for the great work! I am new to spark and this repo has really helped me getting started.
I am trying to get my etl job running on aws EMR in cluster mode, but got hit with an …
-
- [x] Remove config file handling from pipeline code
- [x] Move jobs to separate repo
- [ ] Move End-to-end tests to jobs repo
-
This may be outside the scope of the booklet, but there is a lack of information on how to obtain data from APIs and then format that JSON data into dataframes, and then send that off somewhere.
Fo…
-
### Run Information
Name | Value
-- | --
Architecture | x86
OS | Windows 10.0.22621
Queue | TigerWindows
Baseline | [076ccdf108bae0d0cab36f3d6c896e9ba60076bc](https://github.com/dotnet/sdk…
-
# ETL e ELT e as diferenças entre esses conceitos
***
## Extract - Transform - Load
O processo de **ETL** é quando o fluxo de dados segue uma ordem que conta com o processo de tratamento dos dado…
-
### Run Information
Name | Value
-- | --
Architecture | x64
OS | ubuntu 22.04
Queue | TigerUbuntu
Baseline | [30b34e6d7b475c1de91623d3589723181a260523](https://github.com/dotnet/runtime/commit/30b…
-
Might speed up things more.
As we're not found incremental now (the library returns all similars at once), we don't need the heap.
-
When Rabbit-in-a-Hat reads a scan report to create the `testingFramework.R` file it uses the first line present in the scan for each table to set the default values when creating test cases. The probl…
-
Currently the ETL process relies on already geocoded data (blood lead tests). That data was geocoded by CDPH. There are two issues:
1) for future refreshes of the data we should not rely on CDPH to…
-
https://brightliao.com/2022/06/08/efficient-etl-testing/
Previous posts about Easy SQL A new ETL language – Easy SQL A guide to write elegant ETL Neat syntax design of an ETL language (part 1) Nea…