CoxAutomotiveDataSolutions / waimak

Waimak is an open-source framework that makes it easier to create complex data flows in Apache Spark.
Apache License 2.0
75 stars 16 forks source link

Make sql server extract from historic tables using a single query #101

Closed jamesfielder closed 4 years ago

jamesfielder commented 4 years ago

Description

When extracting from historic tables in sql server we previously would perform two select statements. The first select would pull from the table, and include rows at the current time, and the second would pull all changes from the historic table. These would then be unioned together in the spark code.

We (in cox automotive) have seen issues where this historic extraction wasn't working correctly: rows would be in the wrong state when we queried in this way. We believe this is related to the two queries being non-transactional: since they are executed outside of a transaction the underlying rows might have changed before we query for the history. This change makes the extraction happen inside of a single sql query, which should provide the transactional isolation we need.

Type of change

Bug fix (non-breaking change which fixes an issue)

codecov-commenter commented 4 years ago

Codecov Report

Merging #101 into develop will decrease coverage by 0.20%. The diff coverage is 86.66%.

Impacted file tree graph

@@             Coverage Diff             @@
##           develop     #101      +/-   ##
===========================================
- Coverage    88.41%   88.21%   -0.21%     
===========================================
  Files           74       75       +1     
  Lines         1787     1833      +46     
  Branches        75       72       -3     
===========================================
+ Hits          1580     1617      +37     
- Misses         207      216       +9     
Impacted Files Coverage Δ
...ain/scala/com/coxautodata/waimak/log/Logging.scala 52.38% <50.00%> (-1.47%) :arrow_down:
...data/waimak/rdbm/ingestion/PostgresExtractor.scala 88.23% <75.00%> (+0.73%) :arrow_up:
...ata/waimak/rdbm/ingestion/ExtractionMetadata.scala 77.27% <77.27%> (ø)
...autodata/waimak/rdbm/ingestion/RDBMExtractor.scala 100.00% <100.00%> (+2.17%) :arrow_up:
...ata/waimak/rdbm/ingestion/SQLServerExtractor.scala 100.00% <100.00%> (ø)
...ak/rdbm/ingestion/SQLServerTemporalExtractor.scala 90.47% <100.00%> (+3.80%) :arrow_up:
...waimak/rdbm/ingestion/SQLServerViewExtractor.scala 100.00% <100.00%> (ø)
...ata/waimak/rdbm/ingestion/RDBMIngestionUtils.scala 93.75% <0.00%> (-6.25%) :arrow_down:

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update a36a27c...5b0fef2. Read the comment docs.