CUMLSec / stateformer

MIT License
60 stars 6 forks source link

Training data format #5

Open StarGazerM opened 1 year ago

StarGazerM commented 1 year ago

Hi : I want to train stateformer with my own dateset, can you provide some detail explaination of data format in data-src? what format is *.byte* and what is .inst_pos_emb, how can I extract them from DWARF ?

analognahid commented 10 months ago

I understand what is the inst_pos_emb. For example '0 0 0 0 0 0 0 1 1 1 1 1 1 1 2 2 2 2 2 2 2 3 3 3 3 3 3 3 4 4 4 4 4 4 4 4 ' means the 1st 6 tokens belong to 0th instruction, the next 6 to 1st and so on. I still don't understand the .byye format.