cisnlp / simalign

Obtain Word Alignments using Pretrained Language Models (e.g., mBERT)
MIT License
345 stars 47 forks source link

Clarification in align_files.py #23

Closed knarfamlap closed 3 years ago

knarfamlap commented 3 years ago

Hi!

Thank you for making your tool available! I want to align sentences in two different files. I guess that align_files.py can do exactly this. Do you think you can clarify the format that each file must be in?

I am confused by "Lines in the file should be indexed separated by TABs."

Sorry in advance if this might be trivial.

Thank you in advance!

masoudjs commented 3 years ago

Hi Frank,

Thank you for using our tool. I'm sorry for the confusing description. You are right about the "align_files.py". In both files, each sentence should be in a separate line. This is an example: 0 [TAB] I have a book . 1 [TAB] This book is interesting . 2 [TAB] I went to the library .

By [TAB], I mean you have to put the tab character or use "\t" to write it in the file. I have attached a file with the same sentences in this format (example.txt). If I'm still not clear, please let me know. example.txt

knarfamlap commented 3 years ago

Thank you so much for clarifying!