bchavez / Bogus

:card_index: A simple fake data generator for C#, F#, and VB.NET. Based on and ported from the famed faker.js.
Other
8.81k stars 499 forks source link

FinishWith for the whole generate set #55

Open levipage opened 7 years ago

levipage commented 7 years ago

I am happy to post this somewhere else if you have a uservoice or something. It would be nice if there was a way to do finishwith on a whole set. For instance, I want to generate a collection of objects and always have the first one set to primary.

Currently I have to do the following:

 public static Faker<x> GetBusinessEntityRules()
    {
      var rules = new Faker<x>()
          .RuleFor(o => o.Code, f => f.Random.AlphaNumeric(5))
          .RuleFor(o => o.Dba, f => BogusDataSets.Company.CompanyName())        

      return rules;
    }

     // Get x
      var z = BogusRules.GetBusinessEntityRules().Generate(5).ToList();

      // Set the first to primary
      z[0].Primary = true;

Instead it would nice to make the rule responsible for this:

 public static Faker<x> GetBusinessEntityRules()
    {
      var rules = new Faker<x>()
          .RuleFor(o => o.Code, f => f.Random.AlphaNumeric(5))
          .RuleFor(o => o.Dba, f => BogusDataSets.Company.CompanyName())        
         .FinishCollectionWith(list) {
.... list[0].Primary = true... etc.
}

      return rules;
    }
bchavez commented 7 years ago

Hi @levipage

Thanks for the suggestion. Your suggestion makes sense, but I'll need some time to think this one through. I'm slightly concerned with the API code smell/inconsistency and semantics.

The API code smell relates to Faker<T> and all API calls on Faker<> having a contextual relation to a single instance of <T> itself (ie: RuleFor<T>, FinishWith<T>, etc). Your proposed FinishCollectionWith would operate on IEnumerable<T> and apply contextually when Generate(n) is called; however, it's not semantically clear if FinishCollectionWith would apply called when Generate() is called (for single instance <T>). So, now we have rules that would and would not apply to Faker<T> depending on the context in which it was called upon to generate.

Need to think about this one a bit more. If you have any other ideas on better API sematnics, let me know.

:dash: :walking: "Bubbles of gas in my brain... Send me off balance, it's not enough"

levipage commented 7 years ago

I see your dilemma. I spent awhile thinking about this. Ultimately I think FinishWithWhenIEnumerable would make the most sense. It is convenient that the API can return either a single instance directly or a list, but it does in fact deal with both a single object and IEnumerable, so I don't think it's a stretch that you are able to apply a rule to both cases via Generate() and Generate(N). The other thought was that you have to create a whole separate collection rule and pass that as a second parameter via Generate(N, X). But, this still leaves the caller responsible for it every time and feels a bit convoluted for the sake of (possible, but debatable) purity. It does however allow for the use of multiple collection rule sets. This sounds great by itself, but I would simply create multiple rules to work around this (almost definitely edge case) and circumvent the above issue. That lead me back to the original thought which I think is just a matter of finding a method name that is concise.

I actually like coupling the collection behavior to the rule because you in a way it's already coupled. I am already using this single rule to generate a list so it would only make sense that for it to be a complete rule I need to be able to determine the behavior of how that list gets generated.

Another alternative would be to apply the collection finish rule to both the list and the single item. It would be like applying the rule to a single item list (you already alluded to this in your response). I don't really like this idea though, because it feels ambiguous and if I was using a a properly named method that denoted a collection specific rule, I would expect it doesn't run. I think as a dev if I am trying to enforce a group rule on both single items and multiple items, I would simply call the overload using Generate(1)... when I know I only want one item, to allow the collection rule to be run, and then Generate(X), etc., when I have X multiple items. This would tie back perfectly with a method name like FinishWithWhenIEnumerable.

bchavez commented 7 years ago

I absolutely agree, having to specify the IEnumerable rule with Generate(N, X) would be too clunky. The thought passed through my mind too but wasn't feeling it.

My thinking on T : Generate() and IEnumerable<T> : Generate(N) is that both, the invocation and return value types are naturally balanced (at least in my mind). I think this balance pivots conceptually on a natural boundary we are familiar with, call and return. There is some value, I think, in knowing (or expecting) that all Faker<T> rules, as defined, light up regardless of whichever Generate overload being called. FinishWithWhenIEnumerable sort of breaks way here which depends on the context of the Generate API usage.

However, I do see your motivation for including a "first item rule" in Faker<T>. Keeping rules in one place makes a lot of sense.

What if we did something like this:

var rules = new Faker<Issue55Object>()
    .RuleFor(o => o.Code, f => f.Random.AlphaNumeric(5))
    .RuleFor(o => o.Dba, f => f.Company.CompanyName())
    .FinishWith((f, t) =>
        {
            // f.IndexBogus - Total generated by all Faker<T>, monotonically increasing
            // f.IndexFaker - Total generated by a Faker<T>, monotonically increasing

            // Finally, below, the contextual index when Generate(N) is being called.
            // Range: 0 to N-1
            if( f.IndexGenerate == 0 ) 
                t.Primary = true;
        });

var fakes = rules.Generate(5);

Would this work for you? I have a local branch where I've been working on some refactorings to allow better index/count usages ever since issue #48 came up. Perhaps this might be a viable solution and best of both worlds. IndexGenerate is a contextual counter that advances on each incremental Generate and resets back to zero when a new Generate is called.

Let me know, Brian

:thought_balloon: :mag: "I want to know... what you're thinking..."

levipage commented 7 years ago

This would work for my one use case. But, I think it would be very limiting otherwise. You may want to post sort, etc., etc. I don't think my suggestion breaks the pattern at all. It's part of the rule... When you return more than one item (or a list)... then do this... You just need a way to define that. You don't have a concept of a list rule so I don't see any other place to put it other than in some other FinishWith. I think it's implied than when you aren't returning a list, the list rule wouldn't run. Not sure the best solution beyond what I proposed.

Maybe FinishAll()... expose a list here. Let me do what I want with the items. Maybe there will be only one (as when using regular Generate()). Maybe there will be more than one. If I only return one item via Generate()... apply the rule anyway and just return the one item after applying the rule as if it were a list. Hopefully that makes sense.

bchavez commented 7 years ago

Hi @levipage,

Thinking about this a bit more. Would you consider this a viable solution?

public class DerivedFaker : Faker<Issue55Object>
{
    public DerivedFaker()
    {
        this.RuleFor(o => o.Code, f => f.Random.AlphaNumeric(5));
        this.RuleFor(o => o.Dba, f => f.Company.CompanyName());
    }
    public override IEnumerable<Issue55Object> Generate(int count, string ruleSets = null)
    {
        var list = base.Generate(count, ruleSets)
            .OrderBy(o => o.Code)
            .ToList();
        list[0].Primary = true;
        return list;
    }
}
var derivedFaker = new DerivedFaker();
var fakes = derivedFaker.Generate(5);

This is where my gut feeling is on this. I really want to avoid adding an extra API call just for post generation contexts.

Please let me know, Brian

SimonCropp commented 6 years ago

@levipage still need this one?

bchavez commented 5 years ago

The issue came up again on Twitter.

https://twitter.com/hackmum/status/1080206253073354752

Still need to mentally chew on this problem a little bit more.

bchavez commented 5 years ago

With the introduction of C# 8 features; specifically, Indicies and Ranges, there might finally be an elegant way to "set the first element" after generating a collection with some API fluency.

Off the top of my mind, something like this would be cool:

var rules = new Faker<Issue55Object>()
    .RuleFor(o => o.Code, f => f.Random.AlphaNumeric(5))
    .RuleFor(o => o.Dba, f => f.Company.CompanyName())
    .Set(^1, (f, t) =>
        {
            // do something with the last item of the generation
        })
    .Set(0..2, (f, t[]) =>
        {
            // do something with the first 2 items of the generation
        });

var fakes = rules.Generate(5);

Kind of obvious there are problems with the code above and probably won't work because specifying Indexes and Ranges are more related to the generation context than Faker<T>'s property rules. IE: What do you do when an Index or Range is invalid when the .Generate(n) doesn't apply. Piecemealing Index and Range rules depending on n would not be a good thing.

So maybe:

var fakerObject = new Faker<Issue55Object>()
    .RuleFor(o => o.Code, f => f.Random.AlphaNumeric(5))
    .RuleFor(o => o.Dba, f => f.Company.CompanyName());

var objects = fakerObject
     .Set(^1, (f, t) =>
        {
            // do something with the last item of the generation
        })
    .Set(0..2, (f, t[]) =>
        {
            // do something with the first 2 items of the generation
        })
    .Generate(5);

This way, you keep Faker<T> domain rules and "collection rule operations" separate. With this .Set()pattern, it's like setting up "rules for collections" before .Generate(n) is called. The implementation detail would be that once .Set is called, an underlying object Faker<T> object is forked/diverges into an intermediate FakerCollection<T> object that will keep track of .Set Index and Range rules. Additionally, FakerCollection<T> would keep track of the deterministic FakerHub object so you still get the goodness of determinism related to Faker<T>.

Need to explore more if it is better to call .Set before or after .Generate(n). Maybe you want to set before generate or generate before setting. I need to play with the semantics and see which seems more natural and ergonomic API wise. Also, doing .Set after .Generate(n) would make it harder to capture FakerHub unless you have some kind of surrogate method like .ThenGenerate(n). Also, maybe pick a different name for .Set like .ThenSet(index/range, f->...). Then kinda has a nice ring that naturally flows after something has happened.

This way of thinking about the problem kinda makes sense. Need to think about it a bit more. Mostly these are mental notes of the idea I had when reading C# 8 and wanted to write this random idea down before I forgot.

Another idea is, .GenerateLazy returns IEnumerable<T>; which can be implemented to capture the FakerHub context.

Alternatives:

var fakerObject = new Faker<Issue55Object>()
    .RuleFor(o => o.Code, f => f.Random.AlphaNumeric(5))
    .RuleFor(o => o.Dba, f => f.Company.CompanyName());

var objects = fakerObject.Generate(5)
    .Modify(^1, (f, t) =>
        {
            // do something with the last item of the generation
        })
    .Modify(0..2, (f, t[]) =>
        {
            // do something with the first 2 items of the generation
        });

:dash: :walking: "Bubbles of gas in my brain... Send me off balance, it's not enough"