elgentos / masquerade

Faker-driven, configuration-based, platform-agnostic, locale-compatible data faker tool
MIT License
237 stars 42 forks source link

Best way to share yaml files for extensions? #65

Open peterjaap opened 3 years ago

peterjaap commented 3 years ago

Now that we've added a try/catch block we can add YAML definitions for tables that don't necessarily have to be present in an install.

In our projects, we include YAML files for all possible extensions we use. If a project doesn't have that table, it'll just skip it now.

I've put a few of those YAML files in the Wiki, see:

What would be the best way to share these? I don't think adding them to Masquerade itself would be wise, since that'll clutter stuff and maybe even introduce unexpected behavior. A separate repo maybe? Keep placing them in the wiki? Any other ideas?

cc @tdgroot @johnorourke @Tjitse-E @erikhansen

uznog commented 3 years ago

The way I tackled this issue before issuing #61 was to code a simple app that would check MySQL database for the tables list and then search base configuration directory for YAMLs that mentioned those tables. Needed YAMLs were then bundled into a separate 'final' directory and Masquerade was run based on that final configuration.

This resolved errors that occured when table didn't exist, as I were only using Masquerade against tables I was sure that existed. A downside to that - I had to separate custom config YAMLS into YAML per table, as it was possible that not every table in a group exists - i.e. email_table1 and email_table2 have configuration inside one YAML (email.yaml), but only _emailtable1 exists in the database - Masquerade would still try to run anonymization for non-existing email_table2.

As of the way to share these YAMLs, they should be easily accessible and maintained. Storing them inside a separate repo may introduce more management issues - how to retrieve it for usage easily? how to let app itself obtain them for usage?

It would be nice if app could provide default configuration YAMLs on demand for users to choose which ones they want to use, or even provide only those configs that are suitable for user's database. This would both let configs be stored in app's repo for maintaining and be built into application, while letting users to either run as-is or customize the way anonymization process will run.

Those are just my thoughts though, and I'm not a PHP dev myself - let me know if they make any sense, or if some implementation issues would be a problem.

johnorourke commented 3 years ago

I have some ideas! I have two criteria for this:

I suggest one of these:

Maybe there's even another option - the masquerade phar file could 'require' the vendor/autoload.php from the current folder, and scan it for classes - but believe me scanning all possible classes causes various problems and requires composer dumpautoload --optimize which isn't the default.

I like the first one - simple and can be used with "require-dev" to ensure unnecessary modules don't go into production environments.

peterjaap commented 3 years ago

@johnorourke

We could introduce a --strict-mode flag to throw an exception on missing configs / missing tables / missing columns. Seems easy enough.

I'd be in favor of the composer repo as well. I'd suggest elgentos/masquerade-configs. Then we could add a console command to this repo that can be run with composer's post-install-cmd (when the config package is present) to ask which files should be copied from that repository. It could then create a .masquerade-installed file to make sure this isn't run automatically on each install (and assume it is when --no-interaction is passed).

johnorourke commented 3 years ago

That's a great idea @peterjaap - the single repo would keep them all tidy, easy to fork, allow management of PRs and issues etc, and the post-install-cmd hook would make it really simple to use.

It would need to know the 'platform' config folder the user wants them in - in masquerade core there are several config file locations - perhaps auto-detect to see if any are in use, and/or let the user choose that too?

We'd need to consider updates too - eg. you run and install it, but then later an update to one of the vendor-specific files is released in the composer module - perhaps just warn the user during the post-update-cmd hook if they might be running out of date files?

peterjaap commented 3 years ago

Trying to move this to Discussions but can't find the option? https://docs.github.com/en/discussions/managing-discussions-for-your-community/managing-discussions-in-your-repository#converting-issues-based-on-labels