arielf / weight-loss

Machine Learning meets ketosis: how to effectively lose weight
Other
3.32k stars 150 forks source link

How to do combinations? #30

Open TylerL-uxai opened 7 years ago

TylerL-uxai commented 7 years ago

Hey Ariel :) I saw that you're answering questions through issues. I hope this is the best way to reach you.

I love this project! I'm using it on Clash Royale like this

    2017-07-26,4600,goblin_archer13 fire_fireball9 goblins13 minion13 knight13 royal_giant13 hog_rider10 princess3
    2017-07-26,4628,giant10 firespirit_hut10 minion12 chr_witch7 barbarians12 miner3 ice_wizard3 zap12
    2017-07-26,4598,barbarians13 miner4 wizard10 chr_witch7 princess3 royal_giant13 the_log3 zap13

Everything's working great. I can generate a score chart and see the best cards overall. But what I noticed in username.train is something like this

    -1202.00 | giant10 valkyrie9 minion_horde12 rage_barbarian3 chr_witch7 lightning6 bomber12 zap12

I'm assuming those are the most commonly played cards? Or is that the best possible deck combination, because it has the greatest negative weight (-1202) meaning it wins against the player more frequently than any other combination?

I guess I'm asking the significance of the number on the left and how the features are chosen on the right side, and from there I can work toward finding the best 8 card combination. Thanks for any help you can provide!

arielf commented 7 years ago

Hi @TylerL-uxai

I'm not familiar with Clash Royale so apologies for this.

The number on the left is the most important piece of data in your dataset. It is called "the label". We are doing "supervised machine learning" here, the label is the one guiding us in order to find the most significant factors (in your case cards) which cause an increase (or decrease) in points. I'm assuming that the second column (4600, 4628, 4598 in your example) is the accumulated number of points for the player.

username.train is an intermediate file, generated from your original data, that is used as input to the machine learning process. It is not a final result.

I'm assuming each line is one game and the games are ordered by time. They must be ordered by time played, earliest first.

I'm also assuming that the number following the date in your original data is the number of points this user has accumulated over time.

The meaning of the cards to the right of the | char is what you defined it to be. If your original data has the played cards, then its the played cards.

And from your description, I believe the highly negative number means this particular set of cards makes whoever holds it, lost that number of points on that game (i.e. it is was a relatively weak card combo compared to the opponent).

The final result file name is scores.txt. It should have every card separately in its own line. It is sorted by absolute value so most important cards (strongest and weakest) should be at the top.

Hope this answers your question and that it makes sense...

Edit: there may be a little issue in the original data. The 'trophies' number must be he trophies after playing the cards in the same line, not before. I think if it is the trophies before the game, the results will be actually be junk/random and not what you're looking for...

TylerL-uxai commented 7 years ago

Ah! So losing that many points just means it was a relatively popular combination of characters! That makes sense.

Also, I just used the format in ariel.csv (trophies in the middle). Kept the date constant, which seems to still work.

012-06-10,185.9,
2012-06-11,182.6,salad sleep:0.15 cheese egg halfnhalf:1.5
2012-06-12,181.0,sleep:0.5 tea grape

Thank you so much for taking the time to write this! You're significantly (stats joke... significant) helping me learn artificial intelligence!

Would I use --interaction to change it from the strongest/weakest card to the strongest/weakest combination of cards? Do I need to add a b c d e f g h to the characters in the train inputs file or somehow modify the code that generates the file?

Thank you!

arielf commented 7 years ago

To add interactions you may add some vw flags to VW_ARGS in the Makefile.

For example: add single hidden-layer neural-net with 3 nodes and direct bypass you could add:

    --nn 3 --inpass

As the code is written right now there's no support for name-spaces so -q :: will not work.

Adding such interactions will require a change to the script lifestyle-csv2vw (not hard, feel free to add it)

Once you add a namespace (say f) to the intermediate file output (username.train) just before the names of the cards, you could add stuff like -q ff to VW_ARGS.

Note that the final output (scores.txt) is still limited to individual feature contributions.

HTH.

TylerL-uxai commented 7 years ago

For anyone else wondering, to make namespaces, modify lifestyle-csv2vw with the following code

printf "%.2f |f @sum_factors\n", $sum_gain;

(f is now the namespace)

I'm going to do more research on how to use a neural net to get the best combination of 8 characters. Thanks again Ariel!

TylerL-uxai commented 7 years ago

Any idea why this error fires when I add --interactions or -q? I got it working yesterday somehow (with interactions). Anyway, here's the error:

== FATAL: vw subprocess failed (status=1)

Happens when I clone the repo, add |f to namespace in username.train, and add --interactions ff or -q ff to the VW ARGS edit: If I remove this...

        if not vw_line:
            # End of input
            vw_proc.stdout.close()
            vw_proc.wait()
            if vw_proc.returncode:
                # non-zero exit code, print the full command that
                # failed to help user reproduce/understand it
                fatal("vw subprocess failed (status=%s): '%s'" %
                      (vw_proc.returncode, vw_cmd))
            else:
                # everything looks cool, support debugging anyway
                d("%s: %s examples, exit status: %s" %
                  (vw_cmd, example_no, vw_proc.returncode))

            return

it says this

  File "./vw-varinfo2", line 240, in vw_audit
    if vw_line[0] == '\t':
IndexError: string index out of range

Update edit: It seems to sort of work with interactions if I add a return arbitrarily.

if not vw_line:
            # End of input
            vw_proc.stdout.close()
            vw_proc.wait()
            if vw_proc.returncode:
                # non-zero exit code, print the full command that
                # failed to help user reproduce/understand it
                return # added return before fatal error
                fatal("vw subprocess failed (status=%s): '%s'" %
                      (vw_proc.returncode, vw_cmd))
arielf commented 7 years ago

Please note that the suggested 'fix' above is not a fix. It only masks the real problem. vw crashes (meaning nothing actually works) and the return make the caller continue as if vw didn't crash. End result is that the output is invalid.