Closed deep-diver closed 2 years ago
Check out this pull request on
See visual diffs & provide feedback on Jupyter Notebooks.
Powered by ReviewNB
@deep-diver the low-res PR is a dependency of this PR, right? Let's please hold off until it gets resolved.
@sayakpaul
this is up for the review as well!
@sayakpaul
addressed your comments! thanks for the great comments. I really like them 👍🏼
@sayakpaul
fixed the broken parts, and it is verified to be working properly.
This PR includes the following changes:
tfx-pipeline.ipynb
: modified to demonstrate howTransform
can be integrated into the pipelinepreprocessing.py
: handle low-resolution data preprocessingpreprocessing_full_res.py
: handle full-resolution data preprocessingmodel.py
andmodel_full_res.py
: replacedparse_tfr
andpreprocess
steps from the previous version with transformed dataset fromTransform
local_pipeline.py
:Transform
component is added to the local pipelinepipeline.py
:Transform
component is added to the Vertex pipeline. Also, there is a condition to runTransform
component in Dataflowconfigs.py
: replace data source path to the re-generated TFRecords for the low-resolution datasetNote: In order to use
Transform
, we needStatisticsGen
andSchemaGen
, so they are added to the pipeline as well.Note: This code changes are verified to work on local and Vertex environment.