Open Ledenel opened 5 years ago
I'm thinking about using dask(pandas) DataFrame structure to represent paifu as a mahjong event series with fancy metadata label for filtering and using Apache Parquet for serialization (directly supported by dask and pandas).
Here's some possible columns:
opened hand is one of "chi, pon, kan (both opened and sealed) , ron, tsumo, kita"
already marked bold as basic column, others could be calcuated from basic column / could be ignored without infomation loss.
some trivial example for event primary player and second player properties:
Player A drawed a tile '3p', then: A is primary player, no second Player A shows nothing, see 3p, hand added 3p, opened hand unchanged.
Player A discarded a tile '4s', then: A is primary player A shows 4s, see nothing, hand removed 4s, opened hand unchanged.
Player A claimed a 'chi' on Player B's previous discarded 3m using 24m, then: A is primary player, B is second Player A shows 24m, see nothing, hand removed 24m, opened hand added (chi, 234m) no affect to B
Player A claimed a 'sealed kan' using 7777s A is primary player A shows 7777s, see nothing, hand removed 7777s, opened hand add (sealed kan, 7777s) if in MCR rule, A shows nothing.
then, Player A picked a new dora indicator 6s: A is primary player A shows 6s, see nothing, hand unchanged, opened hand unchanged.
consider add player extra tile for kita(not treated as opened hand for analysis convention.)
Using 1 bit to represent whether a tile is exist or not, 18 bytes to represent a tile collection (since there's only 136/144 tiles, each one can't appear twice), then we can use bitwise operations for tile collection manipulation (for example, xor to add/substract, and to union, or to interset). this could also resolve 0s/5s distinguishing.
need to test Apache parquet's compress rate, a boolean column family / set may be a more readable and obvious solution.consider compatibility with majsoul,
str(TileSet)
may be the most readable and convient solution.butNow 0s has been represented properly in #14 by #21 , it's now safe to use str(TileSet) as a basic representation. reverse conversion is easy usingTile
may not treat 0s, 0m, 0p correctly now (#14) .mahjong.container.utils.tile_set_from_string
.
player could represented simplly by name (ensure platform indenpent), since platform is specified so same name player from different game platform cound be distinguished. player info (gender, level) could be saved in another DataFrame (per game) for further analysis.
player DataFrame columns:
Data from majsoul
Player:
Game:
Round and event:
Can decode the field, but don't know the meaning:
Added some example and marked basic columns.
Data from majsoul
Round and event:
- Basic events (chi,peng,gang,kita,hu...)
- Delta score of each round
- liqibang
- md5
- isliqi
- left_tile_count
Does Basic events contains player draw/discard which tile? Or do we have to infer this from column 'paishan' ?
Does Basic events contains player draw/discard which tile?
Yes, the basic events are:
event | contains | explanation |
---|---|---|
Deal (or Draw) | seat, tile, remain tiles | |
Discard | seat, tile, liqi | liqi = 1 is sb. call liqi |
chi | seat(who eat) , tile | do not contain eat from who |
peng | same as above | same as above |
gang(three kinds) | same as above | same as above |
kita | seat | Do not contain the tile you draw, another Deal event will follow this |
hule | seat (who win), zimo or not, delta points | Do not contain who lose |
liuju |
However, some messages (like eat from who and your handtile) can be infered from other contexts.
Since now we have TenhouEvent
wrapper, we could enhance it directly to map a event to a row (record) for this tabular data format (pandas.DataFrame
), and call DataFrame
constructor on TenhouRecord.events
to get a basic table.
Universal paifu format demo is implemented in v0.1.4. Yet it has not fully completed yet (but extend is quite easy.) It is made of a set of state (defined by enum), a simple type system(by property_manager) and a command to modify the state, a executor to calculate the state by commands. it is acutally like a Assembler and a Virtual Machine (interpreter) (designed for mahjong representation, but quite general for all games). I will add a pull request to demostrate how they works together.
This issue collects all things about paifu analysis, including: