-
Is there an idiomatic pattern for emitting multiple values from a pipeline stage? For example in the `DataPipeline` shown in the docs how could I emit multiple values from the second pipe?
If I use…
-
**Describe the bug**
I am evaluating DataHub as a data catalog and lineage tool for the Data Platform and using this article - https://aws.amazon.com/blogs/big-data/part-2-deploy-datahub-using-aws-ma…
-
Decide on a code formatting standard:
Spacing - Spaces vs Tabs
Function / Class / Namespace Naming
Any other suggestions for standardisation?
-
Following the development of https://github.com/WorldBank-Transport/ram-backend/issues/229 I noticed that the [default profile file](https://github.com/WorldBank-Transport/ram-backend/blob/c2b441dafb2…
-
Code-driven data pipeline to take data from Google Drive (mix of Google Sheets, Excel files, and folders containing those), and land it in BigQuery dataset `tag-dssg-2023-lbc-all-teams.data_raw` with …
-
If I have my standalone spark cluster with hdfs/yarn configured , What changes are required to run this code?
-
I creat a webdataset below, where ResampledShards is defiend to repeat tar files to make sure every gpu and every worker could load different tar file. But i found that dataset will not load the compl…
-
### Community Note
* Please vote on this issue by adding a 👍 [reaction](https://blog.github.com/2016-03-10-add-reactions-to-pull-requests-issues-and-comments/) to the original issue to help the…
-
### 🚀 The feature, motivation, and pitch
Another large win for code quality would be to remove all the dynamic imports (e.g. the `@register_datapipeline`, `@register_orchestrator`... functionality). …
-
Hi,
I'm not sure I've seen a way to use this package for multi-class semantic segmentation. I am aware that using [DataPipeline](https://augmentor.readthedocs.io/en/master/code.html#Augmentor.Pipel…