Closed schmelczer closed 9 months ago
I'm coordinating with legal regarding the CLA
@schmelczer thank you so much for this great contribution, I'll take a closer look at it tomorrow!
We're still actively looking into the CLA
The CLA has been cleared, so I've signed it! Feel free to take a look at the PR @jonas-w, I'm happy to discuss the details, especially if there're any counter-arguments against the lazy dataset loading
Thanks for implementing the laziness in the decorators, something that was on our long list as well.
@schmelczer 🙏 thanks for the contribution.
Thanks for giving this a proper review, I've addressed your comments!
I've done some further testing with schema-less datasets for both lightweight & regular transforms and preview seems to be working as expected! As for the logging, there isn't a trivial way of implementing this (we could look at the JobSpec though) but the foundry telemetry service update is just around the corner which will work with both Spark and Lightweight transforms!
lgtm
@schmelczer would it be possible to remove this merge commit d111123
(#47) and instead do a rebase? This way the history commit history would stay a bit cleaner after we merge it into main.
Otherwise it looks good to me!
Sure, I've dropped the merge commit @jonas-w!
Should be released by now https://pypi.org/project/foundry-dev-tools/
Thanks again @schmelczer!
Summary
Address https://github.com/emdgroup/foundry-dev-tools/issues/45
Add support for running local preview for
@lightweight
and@transform_polars
Transforms which were recently added to the Foundry Transforms library.@lightweight
,@transforms_polars
,LightweightTransformInput
, andLightweightTransformOutput
.Input
download the datasets and load the Spark DataFrames lazily. With this, we can avoid initialising Spark when using Lightweight transforms, which shaves off a few seconds from every preview run.get_local_path_to_dataset
, is added toInput
to lazily load the underlying files of the Dataset without having to initialise Spark when running Lightweight transforms.Limitations
@lightweight
is not yet supported when running preview locally.fdt build
, the build logs are fetched from spark reporter which is not available in the case of (lightweight) container transform. We could use foundry-telemetry-service's readV3 endpoint to tail logs, however, that has an at least 1 minute latency. Fortunately, a new, real-time log fetching endpoint is soont to be released, so we can transition to using that in a follow-up PR.Checklist