MeltanoLabs / meltano-map-transform

A map transformer which implements the `Stream Maps` capability from Meltano's tap and target SDK: https://sdk.meltano.com/
Apache License 2.0
17 stars 15 forks source link

feat: Support generating fake data with future SDK version #215

Closed ReubenFrankel closed 7 months ago

ReubenFrankel commented 7 months ago

Now https://github.com/meltano/sdk/pull/2170 has been merged and once the next version of the SDK is released, it would be great to bump the singer-sdk version used here with the faker extra to leverage the new feature.

edgarrmondragon commented 7 months ago

@ReubenFrankel would you expect this package to enable the use of faker by default, i.e. depend directly on singer-sdk[faker], or would you prefer it to have its own extra?

I think I prefer on the former, since that makes this package more useful and it's presumably installed only once per project, unlike extractors.

ReubenFrankel commented 7 months ago

@edgarrmondragon Yep, I think by default is the most useful.

maxmarcon commented 4 days ago

Hi! I'm resurrecting this because I have a question (I'm relatively new to meltano).

Ok I wanted to use stream_maps with faker with tap-postgres. However, apparently, tap-postgres doesn't support faker (I get an error saying that the "fake object is not defined").

As a workaround, I'm using meltano-map-transform instead. fake is defined, however when I want to set the seed following this example from the Meltano SDK docs:

   - name: meltano-map-transformer
      config:
          stream_maps:
              data_platform_raw-tbii_daily_gmv_nmv:
                  tb_channel_id: Faker.seed(0) or fake.pyint()

I get an error:

'Faker' is not defined for expression 'Faker.seed(0) or fake.pyint()'

Do you know why? Thanks

ReubenFrankel commented 4 days ago

@maxmarcon Looks like that got added recently and will be included in the next SDK release. This plugin (and any other built with the SDK) will need to upgrade to the new version before what you are trying will work.

https://github.com/meltano/sdk/commit/1333278007c8e4daf03f82049de102f041bda878

In the meantime, if you want to set a static seed, you can do this through faker_config.seed as in the docs example you linked, or in an expression with

   - name: meltano-map-transformer
      config:
          stream_maps:
              data_platform_raw-tbii_daily_gmv_nmv:
                  tb_channel_id: fake.seed_instance(0) and fake.pyint()
maxmarcon commented 4 days ago

Thanks @ReubenFrankel for the information. I was indeed able to find a workaround similar to the one you suggested:

fake.random.seed(self) or fake.pyint()
edgarrmondragon commented 4 days ago

I'll try to cut a release of the SDK today, and should follow shortly with a version bump here.