datafaker-net / datafaker

Generating fake data for the JVM (Java, Kotlin, Groovy) has never been easier!
https://www.datafaker.net
Apache License 2.0
1.16k stars 160 forks source link

File Content Provider #995

Closed aadrian closed 11 months ago

aadrian commented 11 months ago

Please add also content file generation for e.g. the File provider, so that not just a file name is generated, but a real file with content of that mime type/extension (at least for the most common file types: pdfs, images, office files, etc.)

Thank you.

bodiam commented 11 months ago

I'm not sure if generating files is a use case for Datafaker. We generate data, not files, images, sound, etc.

We do provide functionality to easily generate your own, and Datafaker provides most if not all the functionality (file names, content types, text) you might need.

Please have a look here for creating custom providers here: https://www.datafaker.net/documentation/custom-providers/

kingthorin commented 11 months ago

Yeah that could get out of hand and bloat the library quick.

aadrian commented 11 months ago

@bodiam

I'm not sure if generating files is a use case for Datafaker. We generate data, not files

Well, file content is also just data. (that needs to comply certain rules - e.g. respect the file format, in order to be useful)

Datafaker provides most if not all the functionality (file names, content types, text) you might need

Exactly, except a correct content that respects the generated file extension/contentType .

snuyanzin commented 11 months ago

@aadrian there is such thing as transformation schemas and transformers https://www.datafaker.net/documentation/schemas/ currently there are csv, json, yaml, sql and others available. Furthermore there are extension points so it's possible to add similar transformers e.g. to pdf or whatever you need. So there is no need to create a separate providers, instead all existing providers could be reused to generate e.g. pdf Feel free to contribute here and submit a PR

bodiam commented 11 months ago

@aadrian Happy if you want to send a PR, but I don't think this will be added by us right now. I'd still have a look at creating a custom provider or at schemas, and see if there's anything you can use.

aadrian commented 11 months ago

there is such thing as transformation schemas and transformers ...

@snuyanzin sorry, but this is not about transformations, but just having the "file" provider complete with a few common contents too, since many test data generated with data faker involves also some file content, not just the file's name (and attributes).

I'd still have a look at creating a custom provider or at schemas ..... but I don't think this will be added by us right now.

@bodiam custom providers look just too complicated. The main advantage of DataFaker is convenience: quickly getting very realistic test data.

Thank you all for your support !

P.S. As an alternative, PanDoc https://pandoc.org/diagram.svgz?v=20230831075849 can have some random text converted to many output formats.

bodiam commented 11 months ago

@aadrian have you tried creating a custom provider? Which part was complex? In my experience, it will take around 10 minutes, and while I'm biased, I can say that compared to schemas, custom providers are quite trivial. I would not say that about schemas.

Please give it a try, happy to help out if you get stuck!

bodiam commented 11 months ago

Ps: pandoc doesn't convert random text, it converts structured text from one format to another. Unfortunately, and arguably understandable, it does quite a poor job at it, at least for some formats. But it's not a text generator of any kind, it's a converter of document formats.

snuyanzin commented 11 months ago

but this is not about transformations, but just having the "file" provider complete with a few common contents too, since many test data generated with data faker involves also some file content, not just the file's name (and attributes).

@aadrian have you read description and looked at the doc?

this is what transformation is doing in datafaker. Datafaker can generate data with providers in some way and then transformers can transform it in any valid format. User don't need to care about rules of format, just about the content and transformer will do the transformation accordingly

bodiam commented 11 months ago

@snuyanzin the way I understand it, it's to do something like "faker.text.markdown()", which will return a File pointing to a file named "todo-list.md" with actual Markdown content.

If my understanding is correct, a schema would probably be less appropriate, and a custom provider would be quite trivial to make.

snuyanzin commented 11 months ago

that sounds strange to me ... from one side approach with provider is a simplest option from another side each time it will require changes to support anything new without any way to reuse existing providers...