Ledenel / auto-white-reimu

a mahjong library aimed to implement mahjong AIs by imitating white reimu -- a excellent mahjong player.
GNU General Public License v3.0
9 stars 1 forks source link

paifu analysis plan #12

Open Ledenel opened 5 years ago

Ledenel commented 5 years ago

This issue collects all things about paifu analysis, including:

Ledenel commented 5 years ago

paifu representation

I'm thinking about using dask(pandas) DataFrame structure to represent paifu as a mahjong event series with fancy metadata label for filtering and using Apache Parquet for serialization (directly supported by dask and pandas).

Here's some possible columns:

opened hand is one of "chi, pon, kan (both opened and sealed) , ron, tsumo, kita"

already marked bold as basic column, others could be calcuated from basic column / could be ignored without infomation loss.

some trivial example for event primary player and second player properties:

Player A drawed a tile '3p', then: A is primary player, no second Player A shows nothing, see 3p, hand added 3p, opened hand unchanged.

Player A discarded a tile '4s', then: A is primary player A shows 4s, see nothing, hand removed 4s, opened hand unchanged.

Player A claimed a 'chi' on Player B's previous discarded 3m using 24m, then: A is primary player, B is second Player A shows 24m, see nothing, hand removed 24m, opened hand added (chi, 234m) no affect to B

Player A claimed a 'sealed kan' using 7777s A is primary player A shows 7777s, see nothing, hand removed 7777s, opened hand add (sealed kan, 7777s) if in MCR rule, A shows nothing.

then, Player A picked a new dora indicator 6s: A is primary player A shows 6s, see nothing, hand unchanged, opened hand unchanged.

consider add player extra tile for kita(not treated as opened hand for analysis convention.)

tile representation

Using 1 bit to represent whether a tile is exist or not, 18 bytes to represent a tile collection (since there's only 136/144 tiles, each one can't appear twice), then we can use bitwise operations for tile collection manipulation (for example, xor to add/substract, and to union, or to interset). this could also resolve 0s/5s distinguishing.

need to test Apache parquet's compress rate, a boolean column family / set may be a more readable and obvious solution.

consider compatibility with majsoul, str(TileSet) may be the most readable and convient solution. but Tile may not treat 0s, 0m, 0p correctly now (#14) . Now 0s has been represented properly in #14 by #21 , it's now safe to use str(TileSet) as a basic representation. reverse conversion is easy using mahjong.container.utils.tile_set_from_string.

player representation

player could represented simplly by name (ensure platform indenpent), since platform is specified so same name player from different game platform cound be distinguished. player info (gender, level) could be saved in another DataFrame (per game) for further analysis.

player DataFrame columns:

canuse commented 5 years ago

Data from majsoul

Player:

Game:

Round and event:

Can decode the field, but don't know the meaning:

Ledenel commented 5 years ago

Added some example and marked basic columns.

Ledenel commented 5 years ago

Data from majsoul

Round and event:

  • Basic events (chi,peng,gang,kita,hu...)
  • Delta score of each round
  • liqibang
  • md5
  • isliqi
  • left_tile_count

Does Basic events contains player draw/discard which tile? Or do we have to infer this from column 'paishan' ?

canuse commented 5 years ago

Does Basic events contains player draw/discard which tile?

Yes, the basic events are:

event contains explanation
Deal (or Draw) seat, tile, remain tiles
Discard seat, tile, liqi liqi = 1 is sb. call liqi
chi seat(who eat) , tile do not contain eat from who
peng same as above same as above
gang(three kinds) same as above same as above
kita seat Do not contain the tile you draw, another Deal event will follow this
hule seat (who win), zimo or not, delta points Do not contain who lose
liuju

However, some messages (like eat from who and your handtile) can be infered from other contexts.

Ledenel commented 5 years ago

Since now we have TenhouEvent wrapper, we could enhance it directly to map a event to a row (record) for this tabular data format (pandas.DataFrame), and call DataFrame constructor on TenhouRecord.events to get a basic table.

Ledenel commented 4 years ago

Universal paifu format demo is implemented in v0.1.4. Yet it has not fully completed yet (but extend is quite easy.) It is made of a set of state (defined by enum), a simple type system(by property_manager) and a command to modify the state, a executor to calculate the state by commands. it is acutally like a Assembler and a Virtual Machine (interpreter) (designed for mahjong representation, but quite general for all games). I will add a pull request to demostrate how they works together.