emreaksan / deepwriting

Code for `DeepWriting: Making Digital Ink Editable via Deep Generative Modeling` paper
MIT License
101 stars 28 forks source link

char_label's value isn't equal to the alphabet's id #11

Closed LinJM closed 5 years ago

LinJM commented 5 years ago

@emreaksan hello, I found the release dataset char_label's id isn't correspond to the alphabet's id. especially, the first id. Others's id - 33 = alphabet's id.

emreaksan commented 5 years ago

Hi @LinJM ,

Yes, there is a discrepancy if you directly compare character integer labels with the alphabet indices. It is due to the LabelEncoder from sklearn.preprocessing. The letters are always transformed into the integer labels by means for LabelEncoder. You can find an example case at https://github.com/emreaksan/deepwriting/blob/a1b011563c58f4b94d92b1110fbc98ffaf35c3e5/data_scripts/json_to_numpy.py#L45 or https://github.com/emreaksan/deepwriting/blob/a1b011563c58f4b94d92b1110fbc98ffaf35c3e5/dataset_hw.py#L271

I think the LabelEncoder sorts the alphabet first and then assigns the indices.

LinJM commented 5 years ago

Thank you for your response