Closed mallardduck closed 9 months ago
Initial Benchmark:
HumanoID> .\vendor\bin\phpbench report --ref=space --report=aggregate
+------------------+----------------------+-----+-------+-----+----------+-----------------+--------+
| benchmark | subject | set | revs | its | mem_peak | mode | rstdev |
+------------------+----------------------+-----+-------+-----+----------+-----------------+--------+
| SpaceStaticBench | benchCreate | 0 | 10000 | 5 | 1.294mb | 46,535.270ops/s | ±2.35% |
| SpaceStaticBench | benchCreate | 1 | 10000 | 5 | 1.294mb | 46,238.756ops/s | ±1.36% |
| SpaceStaticBench | benchCreate | 2 | 10000 | 5 | 1.294mb | 46,772.082ops/s | ±1.31% |
| SpaceStaticBench | benchCreate | 3 | 10000 | 5 | 1.294mb | 31,422.851ops/s | ±1.22% |
| SpaceStaticBench | benchCreate | 4 | 10000 | 5 | 1.294mb | 31,463.686ops/s | ±0.48% |
| SpaceStaticBench | benchCreate | 5 | 10000 | 5 | 1.294mb | 15,986.676ops/s | ±0.30% |
| SpaceStaticBench | benchCreate | 6 | 10000 | 5 | 1.294mb | 16,026.148ops/s | ±1.54% |
| SpaceStaticBench | benchCreate | 7 | 10000 | 5 | 1.294mb | 12,199.870ops/s | ±0.52% |
| SpaceStaticBench | benchCreateRand | 0 | 10000 | 5 | 1.294mb | 88,206.962ops/s | ±0.81% |
| SpaceStaticBench | benchCreateRand | 1 | 10000 | 5 | 1.294mb | 46,260.467ops/s | ±0.69% |
| SpaceStaticBench | benchCreateRand | 2 | 10000 | 5 | 1.294mb | 31,405.076ops/s | ±0.57% |
| SpaceStaticBench | benchCreateRand | 3 | 10000 | 5 | 1.294mb | 16,001.468ops/s | ±0.81% |
| SpaceStaticBench | benchCreateRand | 4 | 10000 | 5 | 1.294mb | 12,383.631ops/s | ±1.38% |
| SpaceStaticBench | benchCreateAndDecode | 0 | 10000 | 5 | 1.294mb | 34,399.652ops/s | ±0.41% |
| SpaceStaticBench | benchCreateAndDecode | 1 | 10000 | 5 | 1.294mb | 34,391.193ops/s | ±0.22% |
| SpaceStaticBench | benchCreateAndDecode | 2 | 10000 | 5 | 1.294mb | 34,769.386ops/s | ±0.83% |
| SpaceStaticBench | benchCreateAndDecode | 3 | 10000 | 5 | 1.294mb | 23,091.127ops/s | ±0.42% |
| SpaceStaticBench | benchCreateAndDecode | 4 | 10000 | 5 | 1.294mb | 23,567.007ops/s | ±1.29% |
| SpaceStaticBench | benchCreateAndDecode | 5 | 10000 | 5 | 1.294mb | 12,269.614ops/s | ±1.12% |
| SpaceStaticBench | benchCreateAndDecode | 6 | 10000 | 5 | 1.294mb | 12,292.121ops/s | ±0.65% |
| SpaceStaticBench | benchCreateAndDecode | 7 | 10000 | 5 | 1.294mb | 9,118.242ops/s | ±1.22% |
| SpaceBench | benchCreate | 0 | 10000 | 5 | 1.294mb | 46,666.742ops/s | ±0.52% |
| SpaceBench | benchCreate | 1 | 10000 | 5 | 1.294mb | 46,355.444ops/s | ±0.40% |
| SpaceBench | benchCreate | 2 | 10000 | 5 | 1.294mb | 46,242.217ops/s | ±0.85% |
| SpaceBench | benchCreate | 3 | 10000 | 5 | 1.294mb | 31,726.416ops/s | ±0.86% |
| SpaceBench | benchCreate | 4 | 10000 | 5 | 1.294mb | 31,587.455ops/s | ±1.81% |
| SpaceBench | benchCreate | 5 | 10000 | 5 | 1.294mb | 16,414.532ops/s | ±1.12% |
| SpaceBench | benchCreate | 6 | 10000 | 5 | 1.294mb | 16,046.979ops/s | ±1.07% |
| SpaceBench | benchCreate | 7 | 10000 | 5 | 1.294mb | 12,150.541ops/s | ±1.62% |
| SpaceBench | benchCreateRand | 0 | 10000 | 5 | 1.294mb | 89,387.784ops/s | ±0.31% |
| SpaceBench | benchCreateRand | 1 | 10000 | 5 | 1.294mb | 46,335.805ops/s | ±0.36% |
| SpaceBench | benchCreateRand | 2 | 10000 | 5 | 1.294mb | 31,422.446ops/s | ±0.71% |
| SpaceBench | benchCreateRand | 3 | 10000 | 5 | 1.294mb | 16,032.582ops/s | ±1.24% |
| SpaceBench | benchCreateRand | 4 | 10000 | 5 | 1.294mb | 12,115.604ops/s | ±1.57% |
| SpaceBench | benchCreateAndDecode | 0 | 10000 | 5 | 1.294mb | 34,747.920ops/s | ±1.49% |
| SpaceBench | benchCreateAndDecode | 1 | 10000 | 5 | 1.294mb | 35,450.333ops/s | ±2.05% |
| SpaceBench | benchCreateAndDecode | 2 | 10000 | 5 | 1.294mb | 35,005.640ops/s | ±1.07% |
| SpaceBench | benchCreateAndDecode | 3 | 10000 | 5 | 1.294mb | 23,418.149ops/s | ±1.38% |
| SpaceBench | benchCreateAndDecode | 4 | 10000 | 5 | 1.294mb | 23,167.263ops/s | ±0.28% |
| SpaceBench | benchCreateAndDecode | 5 | 10000 | 5 | 1.294mb | 12,328.810ops/s | ±2.06% |
| SpaceBench | benchCreateAndDecode | 6 | 10000 | 5 | 1.294mb | 12,412.472ops/s | ±1.14% |
| SpaceBench | benchCreateAndDecode | 7 | 10000 | 5 | 1.294mb | 9,263.597ops/s | ±0.41% |
+------------------+----------------------+-----+-------+-----+----------+-----------------+--------+
as mentioned fractionally different - but that's good since it's not the primary goal of the changes.
@caendesilva - You still using HumanoID in some of your projects? Not really ready for full review of the PR but open to RFC style feedback for sure. I think Rob said he'd look soon when he has time too (so not repining him now to not be a bother).
I think my main thoughts on the current PR pushed is that it really lacks flexibility since everything is just a DictionaryInterface
. So maybe if I define a new contract/interface like:
interface DictionarySection {
public static function hasChildren(): bool;
/**
* @return class-string[]
**/
public static function children(): array;
/**
* Return all the method names possible to call on this class - other than above 2.
* @return string[]
**/
public static function categories(): array;
}
My biggest 'doubt' (carefully avoiding 'problem', since it's not) with this is whether I like 'hardcoded' dictionaries or prefer keeping the wordlists separated in json files. It may not matter much in terms of performance but I am on the fence whether I like the 'freedom' of json files or the 'neatness' of PHP 'dictionary files'.
@caendesilva - You still using HumanoID in some of your projects?
Not using it in an active project ATM, but am more than happy to do a code review when you're ready. In the meantime, here are some general high level notes as requested:
More specific notes on the API:
My biggest 'doubt' (carefully avoiding 'problem', since it's not) with this is whether I like 'hardcoded' dictionaries or prefer keeping the wordlists separated in json files. It may not matter much in terms of performance but I am on the fence whether I like the 'freedom' of json files or the 'neatness' of PHP 'dictionary files'.
What I began to realize is that these JSON files (in this repo) are already "hardcoded". Meaning if we changed the content then that changes the function and output too - i.e. breaking changes.
So from that perspective the "sense of freedom" the JSON provide is a false one that's no more mutable than it would be as PHP code. So if we do move from JSON to PHP dictionary builders we can reproduce the existing JSON in a non-breaking manner. While making the existing liability for breaking changes explicit instead of implicit.
Granted, the risk this adds for us is that we can only do this once since we are expanding the API exposed via PHP. So I absolutely understand how these trade-offs can feel like loosing some 'freedom' compared to using JSON files.
What I began to realize is that these JSON files (in this repo) are already "hardcoded".
Initially they were intended as examples.
So from that perspective the "sense of freedom" the JSON provide is a false one that's no more mutable than it would be as PHP code.
Yes, and that's well documented in the README:
Don't change your wordlist once you go into production. Imagine reassigning or reordering the value of the values A..F in the hexadecimal system. It will be very hard, if not impossible, to make this work correctly without resulting in incorrectly converted HumanoIDs to ID's or causing ambiguous results etc.
Once you pick any wordlist it doesn't matter if it's a JSON file or a PHP dictionary. But in terms of 'ease of use' to play around with these wordlists (during testing/development) I'm leaning towards JSON. I think it also coneys better that you can have whatever list you want whereas PHP files feel more like "this is what I got, this is what I gotta use".
Still, I'm not saying I insist on JSON files, not at all, I'm just not sure which (of either) is the best way forward.
My biggest 'doubt' (carefully avoiding 'problem', since it's not) with this is whether I like 'hardcoded' dictionaries or prefer keeping the wordlists separated in json files. It may not matter much in terms of performance but I am on the fence whether I like the 'freedom' of json files or the 'neatness' of PHP 'dictionary files'.
What I began to realize is that these JSON files (in this repo) are already "hardcoded". Meaning if we changed the content then that changes the function and output too - i.e. breaking changes.
So from that perspective the "sense of freedom" the JSON provide is a false one that's no more mutable than it would be as PHP code. So if we do move from JSON to PHP dictionary builders we can reproduce the existing JSON in a non-breaking manner. While making the existing liability for breaking changes explicit instead of implicit.
Granted, the risk this adds for us is that we can only do this once since we are expanding the API exposed via PHP. So I absolutely understand how these trade-offs can feel like loosing some 'freedom' compared to using JSON files.
Very good point. With my JSON comment I was just thinking that users should be able to load words from a custom JSON file. Question: What would happen if the wordlist changes? It's gonna break existing database entries right? With this change we would have to include a backwards compatibility mode if that's the case.
Question: What would happen if the wordlist changes? It's gonna break existing database entries right? It depends on how you use it - if you save the generated HumanoID into the DB on your models as a slug. Then it all kinda keeps working until eventually you generate an ID that overlaps.
If you generate and "decode" HumanoIDs on the fly then it will break everything very fast at worst. Or at best it will cause SEO headaches when your HumanoID based URLs all change; turning old ones into 404s as they come up in search/etc. (As rob pointed out we did make sure to include massive warnings about that issue/risk though).
With this change we would have to include a backwards compatibility mode if that's the case.
Nope - this change is intended to reproduce the existing wordlist behavior exactly.
So it's adding a lot of API surface to the library. But not actually changing any results for any existing users already working from our Space/Zoo wordsets.
What I began to realize is that these JSON files (in this repo) are already "hardcoded". Initially they were intended as examples.
🤔 That's a fair point I forgot over time.
I guess I could just make a spinoff humanoid-dictonary-builder
package to handle this idea. ~But then that package would include humanoid
as a dependency with the existing word lists as "preconfigured examples". Essentially just making it a HumanoID package wrapper.~ Ope, just thought of a more neat spin on that idea...instead of a wrapper with the existing API I've got cooked up. I just make it a static content generation tool that will use the API I've created to generate and save the JSON dictionaries.
This feels a lot nicer because it can further decouple this library from the "state" that ultimately dictates the behavior. Since the end-users should be taking as much ownership of that as possible, especially if they want to deviate from our 2 example word sets.
Question: What would happen if the wordlist changes? It's gonna break existing database entries right? It depends on how you use it - if you save the generated HumanoID into the DB on your models as a slug. Then it all kinda keeps working until eventually you generate an ID that overlaps.
If you generate and "decode" HumanoIDs on the fly then it will break everything very fast at worst. Or at best it will cause SEO headaches when your HumanoID based URLs all change; turning old ones into 404s as they come up in search/etc. (As rob pointed out we did make sure to include massive warnings about that issue/risk though).
With this change we would have to include a backwards compatibility mode if that's the case.
Nope - this change is intended to reproduce the existing wordlist behavior exactly. So it's adding a lot of API surface to the library. But not actually changing any results for any existing users already working from our Space/Zoo wordsets.
What I began to realize is that these JSON files (in this repo) are already "hardcoded". Initially they were intended as examples.
🤔 That's a fair point I forgot over time.
I guess I could just make a spinoff
humanoid-dictonary-builder
package to handle this idea. ~But then that package would includehumanoid
as a dependency with the existing word lists as "preconfigured examples". Essentially just making it a HumanoID package wrapper.~ Ope, just thought of a more neat spin on that idea...instead of a wrapper with the existing API I've got cooked up. I just make it a static content generation tool that will use the API I've created to generate and save the JSON dictionaries.This feels a lot nicer because it can further decouple this library from the "state" that ultimately dictates the behavior. Since the end-users should be taking as much ownership of that as possible, especially if they want to deviate from our 2 example word sets.
I think a content generation tool would be great. The proposed code API looks fun to play with, but it implies that the user can change the builder settings at any time. Having something generate a static file seems like a great idea because that doesn't imply that it's changeable.
Thanks for the feedback gents! I'm going to take a break for a bit and pivot to the "static JSON generator" tool idea. That way I can still keep playing with this fun API idea and we don't blur the lines of how this project needs to use the dictionary concept.
Probably will even bake that same idea into the generator tool so that it throws a warning and doesn't generate if it identifies differences. That way it will still be designed with HumanoID in mind, but users wishing to reuse the dictionary files/concept for other things can generate multiple files under varying names.
Thanks for the feedback gents! I'm going to take a break for a bit and pivot to the "static JSON generator" tool idea. That way I can still keep playing with this fun API idea and we don't blur the lines of how this project needs to use the dictionary concept.
Probably will even bake that same idea into the generator tool so that it throws a warning and doesn't generate if it identifies differences. That way it will still be designed with HumanoID in mind, but users wishing to reuse the dictionary files/concept for other things can generate multiple files under varying names.
Sounds like a great plan! Keep us updated :)
Feature Concept
Provide a simple API to allow end-users to define custom dictionary configs based on existing project words.
Use existing Space/Zoo dictionary:
Use a custom re-composed dictionary based on space and animal words:
Why
From a high level the basic idea is simple, we already have a curated list of 2 word sets which include a lot of varying types of words. By classifying those words/terms into smaller subsets, an end-user can "re-compose" the dictionary lists for their project to their own choosing.
This flexibility alone is not the only benefit for end-users of our library. By exposing a simple composition API (mainly new) end-users custom dictionary can exist directly in their code base with no need for a local JSON file. Similarly, this change would remove our vendored JSON file too; both those together have a cool effect of making the execution path more simple. Fully removing the IO to load file contents and json_decode calls
Potential Concerns
json
word files are handled already.static::all()
method results; as such these are breaking changes too.DictionaryInterface
current contract details, so I'm rethinking that a lot.DictionaryInterface
; making that part shine more as a feature ofDictionaryInterface
not a bug.Performance Notes
So far it doesn't slow anything down, add more RAM usage, nor speed things up or reduce anything. That's kinda perfect since we just want this to improve the DX, not as a means to improve/change performance. the api I was inpsired by is here