Open TylerL-uxai opened 7 years ago
Hi @TylerL-uxai
I'm not familiar with Clash Royale so apologies for this.
The number on the left is the most important piece of data in your dataset. It is called "the label". We are doing "supervised machine learning" here, the label is the one guiding us in order to find the most significant factors (in your case cards) which cause an increase (or decrease) in points. I'm assuming that the second column (4600, 4628, 4598 in your example) is the accumulated number of points for the player.
username.train
is an intermediate file, generated from your original data, that is used as input to the machine learning process. It is not a final result.
I'm assuming each line is one game and the games are ordered by time. They must be ordered by time played, earliest first.
I'm also assuming that the number following the date in your original data is the number of points this user has accumulated over time.
The meaning of the cards to the right of the |
char is what you defined it to be. If your original data has the played cards, then its the played cards.
And from your description, I believe the highly negative number means this particular set of cards makes whoever holds it, lost that number of points on that game (i.e. it is was a relatively weak card combo compared to the opponent).
The final result file name is scores.txt
. It should have every card separately in its own line. It is sorted by absolute value so most important cards (strongest and weakest) should be at the top.
Hope this answers your question and that it makes sense...
Edit: there may be a little issue in the original data. The 'trophies' number must be he trophies after playing the cards in the same line, not before. I think if it is the trophies before the game, the results will be actually be junk/random and not what you're looking for...
Ah! So losing that many points just means it was a relatively popular combination of characters! That makes sense.
Also, I just used the format in ariel.csv (trophies in the middle). Kept the date constant, which seems to still work.
012-06-10,185.9,
2012-06-11,182.6,salad sleep:0.15 cheese egg halfnhalf:1.5
2012-06-12,181.0,sleep:0.5 tea grape
Thank you so much for taking the time to write this! You're significantly (stats joke... significant) helping me learn artificial intelligence!
Would I use --interaction to change it from the strongest/weakest card to the strongest/weakest combination of cards? Do I need to add a b c d e f g h to the characters in the train inputs file or somehow modify the code that generates the file?
Thank you!
To add interactions you may add some vw flags to VW_ARGS
in the Makefile
.
For example: add single hidden-layer neural-net with 3 nodes and direct bypass you could add:
--nn 3 --inpass
As the code is written right now there's no support for name-spaces so -q ::
will not work.
Adding such interactions will require a change to the script lifestyle-csv2vw
(not hard, feel free to add it)
Once you add a namespace (say f
) to the intermediate file output (username.train
) just before the names of the cards, you could add stuff like -q ff
to VW_ARGS
.
Note that the final output (scores.txt
) is still limited to individual feature contributions.
HTH.
For anyone else wondering, to make namespaces, modify lifestyle-csv2vw with the following code
printf "%.2f |f @sum_factors\n", $sum_gain;
(f is now the namespace)
I'm going to do more research on how to use a neural net to get the best combination of 8 characters. Thanks again Ariel!
Any idea why this error fires when I add --interactions or -q? I got it working yesterday somehow (with interactions). Anyway, here's the error:
== FATAL: vw subprocess failed (status=1)
Happens when I clone the repo, add |f to namespace in username.train, and add --interactions ff or -q ff to the VW ARGS edit: If I remove this...
if not vw_line:
# End of input
vw_proc.stdout.close()
vw_proc.wait()
if vw_proc.returncode:
# non-zero exit code, print the full command that
# failed to help user reproduce/understand it
fatal("vw subprocess failed (status=%s): '%s'" %
(vw_proc.returncode, vw_cmd))
else:
# everything looks cool, support debugging anyway
d("%s: %s examples, exit status: %s" %
(vw_cmd, example_no, vw_proc.returncode))
return
it says this
File "./vw-varinfo2", line 240, in vw_audit
if vw_line[0] == '\t':
IndexError: string index out of range
Update edit: It seems to sort of work with interactions if I add a return arbitrarily.
if not vw_line:
# End of input
vw_proc.stdout.close()
vw_proc.wait()
if vw_proc.returncode:
# non-zero exit code, print the full command that
# failed to help user reproduce/understand it
return # added return before fatal error
fatal("vw subprocess failed (status=%s): '%s'" %
(vw_proc.returncode, vw_cmd))
Please note that the suggested 'fix' above is not a fix. It only masks the real problem.
vw
crashes (meaning nothing actually works) and the return make the caller continue as if vw
didn't crash. End result is that the output is invalid.
Hey Ariel :) I saw that you're answering questions through issues. I hope this is the best way to reach you.
I love this project! I'm using it on Clash Royale like this
Everything's working great. I can generate a score chart and see the best cards overall. But what I noticed in username.train is something like this
I'm assuming those are the most commonly played cards? Or is that the best possible deck combination, because it has the greatest negative weight (-1202) meaning it wins against the player more frequently than any other combination?
I guess I'm asking the significance of the number on the left and how the features are chosen on the right side, and from there I can work toward finding the best 8 card combination. Thanks for any help you can provide!