DiUS / java-faker

Brings the popular ruby faker gem to Java
http://dius.github.io/java-faker
Other
4.69k stars 842 forks source link

Faker should allow loading external yml files #489

Open daniel-frak opened 4 years ago

daniel-frak commented 4 years ago

Is your feature request related to a problem? Please describe. I'm trying to extend Faker to generate some domain-specific data using yml files. The only solution to this is proposed in #255. Unfortunately, that solution is only a hack and I consider it bad code. I should not have to hijack locale to inject my own definitions and I should be able to load more than one yml file.

The main problem seems to lie in FakeValuesService which, for English, completely skips loading external files (relying on a predefined, non-changeable list) and for other languages only loads a single file via FakeValues.

Describe the solution you'd like Faker should always dynamically find all yml files or, at the very least, there should be an option for passing (multiple) additional files through the constructor.

Describe alternatives you've considered The only alternative seems to be the lackluster solution proposed in #255.

Additional context

Not all data is going to be useful to the general public and not all data can be shared with the general public due to company policies. While the library is immensely useful, a single yml file (and broken locale) is not enough to ensure clean and maintainable code in projects using it.

schallerala commented 4 years ago

I partly agree with you. That the control around additional files is a bit lacunar, as it is hard-coded, and also only for the root English Locale.

However, you aren't constrained to use the Faker class as is. You could extend it, and extend its provider to resolve a new key that you would declare in your Yaml file.

Faker extension:

public class Faker extends com.github.javafaker.Faker {
    /** Extended {@link com.github.javafaker.Name} class. */
    private final Name name;
    private final ProviderX providerX;

    // Initialize those in the constructor(s) 
    // where you call the super class constructor

    @Override
    public Name name() {
        return this.name;
    }
    // ...
}

Name provider extension:

public class Name extends com.github.javafaker.Name {

    private final Faker faker;
    protected Name(Faker faker) {
        super(faker);
        this.faker = faker;
    }

    @Override
    public String firstName() {
        return this.faker.bool().bool() ? super.firstName() :
                this.faker.bool().bool() ? this.accentuatedFirstName() :
                        this.accentuatedMultipleFirstName();
    }

    public String accentuatedFirstName () {
        // custom key
        return this.faker.resolve("name.accentuated_first_name");
    }
    // ...
}

My local Yaml file:

fr-CH:
    # if a field isn't found in this file, it falls back on the fr.yml, and 
    # then english (en.yml and other direct files)
    faker:
        name:
            accentuated_first_name: [Fédéric, Mélissa, Sébastien, Désiré, Hervé, Raphël, Noël, Jérôme, Joël, 'Jean-Louis', 'Jean-Pierre', François, Léa, Cédric, Céline]

        provider_x:
            a: [1, 2, 3, 4, 5, 6]
            # you get it
schallerala commented 4 years ago

To get back on the management of additional files like in EnFile, how would you, OP, @daniel-frak, implement a different approach?

Annotation on provider class? An interface to extend for each provider?

Then when loading the values, look for those with reflection?

daniel-frak commented 4 years ago

Regarding the proposed code, as far as I remember this will not work if the locale is set to default and so does not solve the problem. I know that I can extend Faker but I cannot make it resolve a definition which it did not load.

My main issues are that the locale needs to be "hijacked" in order for the custom yml to be resolveable (as per #255) and that it seems impossible to spread custom definitions over multiple files.

Additionally, the way it's done now violates the Principle of Least Astonishment - it's surprising that the default English locale is loaded entirely differently to the other locales and that there exists a single locale which doesn't load custom definitions.

Right now I think that the best way would be to define a directory in src/java/resources, for example src/java/resources/faker, where custom yml files would be stored. FakeValuesService would then load all yml files within both the the application's directory and its own (with user files having priority). For added customizability, the location of the user definitions could be overriden via code.

This would somewhat mimic the current structure of the English locale, enabling the user to neatly divide all custom definitions into separate files (a feature which, in my opinion, is crucial). An example file structure for custom user definitions for the English locale would be:

If the user wanted separation of locales, this setup would also allow the samurai_jack.yml, game.yml and medical_term.yml to be redefined for every locale in src/java/resources/faker/${LANG_CODE}. Otherwise, all supported locales could be defined in a single file.

Going further, I would make the yml file resolver an interface, so that users could potentially write their own (e.g. to load their definitions from the cloud or something).

The main thing is to make extendability a first-class citizen in Faker in a way that's maintainable and easy to understand for the user. Furthermore, it should not require the user to write any more Java code. While I agree that the best way is to write faker.name().accentuatedFirstName(), I technically can just use this.faker.resolve("name.accentuated_first_name"). Therefore, the user should not be forced to write any additional Java code to get his custom definitions to work (though more advanced configuration could be done via code, ofc).

Of course, all of this (and best practices) should be documented so that the extendability is obvious to everyone.

What do you think?

daniel-frak commented 4 years ago

To somewhat simplify the above (and hopefully make it a bit clearer), I'm imagining a DefinitionResolver interface implemented by a DefinitionFileResolver class, which would load all .yml files in a given folder (say, src/resources/faker by default) and resolve locales only via the root key in the file (e.g. en: in the default EnFile files).

This would enable a user to have a single folder for all definitions but would give enough flexibility for them to structure the files within it however they want and place the definitions into however many files they need to. Additionally, this would allow more advanced users to write their own definition resolvers to extend Faker's usefulness even further.

schallerala commented 4 years ago

Would you then declare DefinitionFileResolver as a singleton?

What about its methods? Add a loadValues like in FakeValues which you pass the class of the provider so you know where to look for the Yaml file?

Don’t believe you/we could change the location of the Yaml files, as there are so many ongoing PR or it would also break so many users’ source code if we move them. Therefore, if you are motivated to reorganize them, we would still need some fallback on the current logic.

Even then, how would you allow the user to override the loading strategy? So users could load across the network or what I know. As there isn’t any dependency of an injection library.

Not a singleton but a property of the Faker class? That you can pass as a parameter in its constructor.

Or an annotation on the provider classes? But then, if you want to change the loading strategy you have to override all the classes?

daniel-frak commented 4 years ago

Would you then declare DefinitionFileResolver as a singleton?

I would declare DefinitionFileResolver as a POJO injected via Faker's constructor. Additionally, the no-args constructor would new it (as it is done now with other dependencies) for ease of use and backwards compatibility.

What about its methods?

The interface would provide a complete collection of fake values, which I believe to be FakeValuesInterface implementations (though I'm not sure, it's been a while since I dug so deep through Faker's code). The idea is that Faker's core should not care about where the definitions come from (whether from a file, from memory or from some REST endpoint) and this interface is the boundary to that knowledge. A simple method like List<FakeValuesInterface> getDefinitions() should suffice.

Don’t believe you/we could change the location of the Yaml files, as there are so many ongoing PR or it would also break so many users’ source code if we move them. Therefore, if you are motivated to reorganize them, we would still need some fallback on the current logic.

I think backwards compatibility isn't an issue, as the file search would be dynamic. The easiest solution is to make Faker look for yaml files in the entire resources directory. Another one is to make the currently used directory (resources/language, I believe?) default but, as mentioned before, still allow for changing the directory via Faker's constructor parameter (or builder). Both of these solutions require no further fallbacks.

Even then, how would you allow the user to override the loading strategy? So users could load across the network or what I know. As there isn’t any dependency of an injection library.

Simple:

var faker = new Faker(); // Default faker
var faker = new Faker(new MyOwnLoadingStrategy(), "resources/myCustomPath"); // Custom faker

//Alternatively:
var faker = new FakerBuilder().build(); // Default faker
var faker = new FakerBuilder()
  .path("resources/myCustomPath")
  .fileResolver(new MyOwnLoadingStrategy())
  .build(); // Custom faker

But then, if you want to change the loading strategy you have to override all the classes?

I'm not sure where you got this idea of overriding classes. That's exactly why I introduced the DefinitionResolver interface. As a user, you can either use the provided DefinitionFileResolver or roll your own, independent of it.

schallerala commented 4 years ago

I would declare DefinitionFileResolver as a POJO injected via Faker's constructor. Additionally, the no-args constructor would new it (as it is done now with other dependencies) for ease of use and backwards compatibility.

var faker = new Faker(new MyOwnLoadingStrategy(), "resources/myCustomPath"); // Custom faker

I wouldn't recommend separating the path from the strategy construction, as the path would be specific to that strategy. Like you said, other loading strategy, getting values over a REST API for example, would need different configuration parameters as the one looking for files.

But then, if you want to change the loading strategy you have to override all the classes?

I'm not sure where you got this idea of overriding classes. That's exactly why I introduced the DefinitionResolver interface. As a user, you can either use the provided DefinitionFileResolver or roll your own, independent of it.

The question was about the following problem: Your suggestion wouldn’t be flexible enough to have different loading strategies between providers.

For example, you are happy about the lib’s provider as is, but you want to add or change only one resource, let’s say the Address that you want to load from a database.

You would need to extend the loading strategy passed in the Faker constructor and then greatly complexify the code, to do the same kind of distinction as the English locale and other locales (what brought us here with this issue).

Just a quick idea of mine would be, having a root abstract class that all the resources/providers extend like:

@LoadingStrategy(strategy=FakerFileLoader.class, parameters={"resources/faker/en"})
public abstract class Provider {
    protected final Faker faker;

    protected Provider (Faker faker) {
        this.faker = faker;
    }
}

// ... 

public class Animal extends Provider {
    protected Animal (Faker faker) {
        super(faker);
    }
}

// own code

@LoadingStrategy(strategy=MyOwnLoadingStrategy.class)
public class Address extends Provider {
    protected Address (Faker faker) {
        super(faker);
    }
}
randysecrist commented 3 years ago

I just tried adding a customer file today; ran into the same issues. I would really like to see this done soon. Do you all need some help?

daniel-frak commented 3 years ago

I needed a solution at the time and unfortunately couldn't spend more time on discussing a Faker-specific solution, so I created my own dummy data generator, which eventually evolved into Dummy4j: https://github.com/daniel-frak/dummy4j

Sorry if this seems a bit selfish, but I didn't have much choice at the time and I do believe that Dummy4j's focus on extensibility makes it differ from Faker in a major way. @randysecrist I think it might solve your problems but bear in mind that it is not a fork of Java Faker.

As for this issue, I'm satisfied with the solution I achieved in my project so, unfortunately, I stopped actively participating in the discussion.

randysecrist commented 3 years ago

@daniel-frak Thanks for sharing. Faker isn't a requirement for me so ... I will certainly check dummy4j out. I find myself writing a lot of things that look like this:

listOf(1, 2, 3, 4, 5).shuffled().take(1)[0]

And would like to categorize and classify things.

jvmlet commented 3 years ago

Ugly hack via reflection till the issue is resolved (to be inserted in the main class):

static{
        final Field enFilesField = ReflectionUtils.findField(EnFile.class, "FILES");
        ReflectionUtils.makeAccessible(enFilesField);
        final List<String> files = new ArrayList<>((List<String>) ReflectionUtils.getField(enFilesField, null));
        files.add("your_file.yml");
        ReflectionUtils.setField(enFilesField,null,files);
}

The ReflectionUtils class is from org.springframework.util package

bodiam commented 1 year ago

At the beginning of this year we forked this library and we now maintain an active fork at Datafaker.net

One of the new features provided is the ability to write your own Faker, using either hardcoded values or yml files.

You can find the documentation here : https://www.datafaker.net/documentation/custom-providers/

Hope this can help someone out.