Discuss changes to the models

tjbenton commented 6 years ago

I would like to open a discussion that would involve in changing the way that models are written.

I find that writing models using yaml is more cumbersome and confusing than it would be if we supported JS files only, instead of dealing with the overhead of compiling yaml files into js. Currently models.js is 501 lines long which makes it harder to maintain than it should be. This code is just to read each model file and convert it into a js object and loop through each key and convert functions that are strings into real js functions. I think that if we just supported JS files we could reduce this file to be 1/3 the size it is now.

Moving to a JS api would remove a lot of the magic that happens inside of the pre_*, post_*, build functions. Where we are passing in context, documents, globals, inputs, faker, chance, index, require to each of the functions. These things that are passed in all become global variables(aka magic) to the developer writing the function. The developer can't easily see what variables they have access to without reading the documentation very throughly which makes debugging hard. But even if they read the documentation thoroughly they are forced to use what ever name we gave each of the arguments and now have to remember those specific names we used like document_index.

Not using yaml would solve our problems of trying to use a library like worker-farm that uses child workers to spread out CPU intensive tasks to all the available CPUs on the machine. This means that we can run each model in a separate node instance which would improve performance tremendously. It would help with a common problem people are running into where their computers might lock up while they're trying to generate millions of records. The reason we can't use yaml along side of this is because of the way worker nodes work. To send in options into a worker node you have to send it in a string form which means we can't pass in our parsed model object full of functions in as an argument. If we used JS all we have to do is send in the path to the file and then inside of our worker node require it and execute it.

Another reason why it would be better to use JS only is that people can require a package like they normally would at the top of a file instead of the overhead of them requiring a package inside of build function that will get called thousands of times. It would also allow us to remove support for definitions how they exist today because they can now become a package/file that can be imported a the top of a file.

I would like to rethink how models are created, and how to use a chainable function library similar to joi to do this. I think this would make creating models easier and more intuitive.

Here's and example of my proposal for using JS to generate models

Example

```js import fakeit from 'fakeit' import globby from 'globby' export default fakeit // all the top level options for the model would be declared inside of the options function .options({ name: 'Countries', key: '_id', dependencies: [], // you could use `require` for dependencies, or pass in a string of the filepath inputs: { // so that this doesn't get off topic we would support an `objects` for inputs countries: '../input/countries.json', }, // several of our examples set the count dynamically based off of inputs or data from other dependencies // so to me it makes sense to allow `min`, `max`, and `count` to be a number or a function. // This will allow cleaner way to dynamically set the count. // Note I wouldn't make these functions async capable. count({ $inputs }) { // inputs would be `$inputs` to note that it's a dynamic return $inputs.countries.length }, // `before` replaces `pre_run` // `beforeEach` replaces `pre_build` // `afterEach` replaces `post_build` // `after` replaces `post_run` // Using this terminology makes more sense because it lines up more with how testing libraries are setup // which should make it easier for users to understand what it it async before(context) { // Anything that would be set on `context` would be local to this model, not other models in other files. // This way there's no conflicts with other models overwriting variables context.index = 0; // for those that would be curious you would still be able to reference the count inside of a `before` function context.$options.count = context.$inputs.countries.length; }, beforeEach(context) { // it would be nice to convert things like chance to become plugins and we could reference them with a `$` // just like the other instances of dynamic things. This is also how `vue` handles plugins. We could also do // `this.$plugins.chance`, but I don't think there's any benefit to that except you get to work on your typing skills more that you should. context.index = context.$chance.integer({ min: 0, max: context.$inputs.countries.length - 1 }) } }) // `type` would no longer be relevant because we can just use the `type` it's self, just like `joi`. // just like with joi you would be able to do `object([schema])`, or `object.keys([schema])` .object({ _id: fakeit // `.after` would would replace post_build. .after(() => `country_${fakeit.ref('gdp')}`) // .string() // would convert the result to a string. We would no longer require a `type` gdp: fakit .build(({ $faker }) => $faker.random.number({ min: 1000, max: 75000 })) .integer(), // would ensure it's an integer phones: fakeit .array() .items(fakeit.object({ type: fakeit .build(({ $faker }) => $faker.random.arrayElement([ 'Home', 'Work', 'Mobile', 'Main', 'Other' ])), phone_number: fakeit .build(({ $faker }) => $faker.phone.phoneNumber().replace(/\s*x[0-9]+$/, '')), })) // .length() would specify the specific length of the array .min(2) .max(20) }) ```

I think there's a lot of really neat stuff we can do using JS like this that would make the creation of models easy. If we decide that we still want to support yaml for some reason I would suggest making it a plugin for the app so the default behavior would be JS.

I would like to hear from several people in the field that use fakeit currently and see what their thoughts are on this to see if it's worth continuing support for yaml.

@bentonam @mistersender @tabrindle @brantburnett @alburdette619

brantburnett commented 6 years ago

@tjbenton

I can definitely see the value of this approach in complex scenarios. Juggling lots of code and variables for model generation in YAML can be a pain. But I do find the simple, declarative syntax simpler and easier to use for basic models.

Is there a way that both approaches could still be maintained? I realize that this misses some of the goal of simplification. But maybe just treat YAML support as an optional extension that simply parses the YAML, creates the options object above, and sends it into fakeit? Then it could be maintained separately. Also, since the YAML form would only be intended for simple objects maybe some of the more complex constructs could be dropped?

tjbenton commented 6 years ago

@brantburnett I agree that's what I was thinking to. If we do want to keep support for YAML I would like to see it as a separate package to simplify this code base. Supporting plugins is something that I think would be a good idea to do so that we can move some things out of this main repo and make maintenance easier and install times faster, and it would also open up the ability to add other custom things easier.

I will open up another issue to talk about making the base fakeit library lighter with the use of plugins.

alburdette619 commented 6 years ago

I agree with this entirely. I am biased because I am primarily a JS developer and therefore this looks much cleaner to me. Not having my build functions as strings and having the ability to lint the files, etc would be very useful to me and make me and the more junior developers around me that use this library much more comfortable using it.

brantburnett commented 6 years ago

Oh, another point I'd like to add (though the JS people here may scoff at me). If we are going to support JS as the primary approach, it would be really helpful to us if TypeScript typings were included. We're doing all of our JS work in Typescript here, so being able to do have support for writing the models in Typescript with strong typing and code completion would be very helpful to us, especially in terms of speed.

tjbenton commented 6 years ago

I don't think using TypeScript for the whole project is a good idea at this time.

My main issues with it are

It will add an extra level of complexity
It adds a lot of noise in the code that makes it hard to read and focus.
We don't have very many contributors to this project already and the number would be even smaller if we used TypeScript
It's a different language that's not standard JS which means a learning curve
If you change your js files to ts and run it, nothing will happen even if there's an obvious error. Basically you only get what you put in to TypeScript
It's a lot of code to convert to TypeScript, and if TypeScript goes away at any point you're left with a none standard language.

If we can add the TypeScript definitions files to the repo without doing everything in TypeScript I would say it might be worth doing.

After spending the last couple days researching TypeScript and Flow, I would lean more towards Flow since it goes with out current model of development since we already use Babel it would be straightforward to add babel-preset-flow. There's things that I don't like about both TypeScript and Flow, but less things I disliked about flow. I liked that with Flow you can add it and get benifits from it immediately without doing anything because it will infer your code.

This is the main thing that I liked about both TypeScript and Flow. Both of them treat this the same. x has to be a number, y has to be a string, z has to be a boolean and the method function will return a boolean.

function method(x: number, y: string, z: boolean): boolean {
  // ...
}

I think having stuff like this wouldn't hurt the readability and is straight forward enough for developers to understand easily.

I don't like generics I think it make stuff pretty hard to read and more confusing than beneficial.

function identity<One, Two, Three>(one: One, two: Two, three: Three) {
  // ...
}

I'm not totally sold on having a static type checker yet. But trying out the simple things in flow isn't a bad idea.

https://medium.com/javascript-scene/you-might-not-need-typescript-or-static-types-aa7cb670a77b https://medium.com/javascript-scene/the-shocking-secret-about-static-types-514d39bf30a3

brantburnett commented 6 years ago

@tjbenton

Oh, I agree we don't need to do the whole project in Typescript. Sorry, I wasn't clear. I just mean that we should include a typings file so that the project can be consumed by a TypeScript user. Without tyings, TypeScript users are basically up the creek. But by defining in a separate typings file what the structure of your types are like it allows plain Javascript libraries to be used.

Brant

tjbenton commented 6 years ago

I would be okay with doing that since it wouldn't require switching to TypeScript, just adding support for the typings.

mistersender commented 6 years ago

yeah, i would prefer not-yaml as well, it is pretty cumbersome, i have some awkward workarounds for it in most of my code already

bentonam / fakeit

Discuss changes to the models #163