Open Amen-bang opened 2 years ago
Hi!
Please, provide the text you trying to split and the lang_code.
Please, provide the text you trying to split and the lang_code.
I got the same error. I followed this article, so there is no lang_code
, there are just:
lang_from = "ru"
lang_to = "en"
aligner.fill_db(db_path, splitted_from, splitted_to)
.fill_db()
has a signature (db_path, lang_from, lang_to, splitted_from=[], splitted_to=[], proxy_from=[], proxy_to=[])
, so set lang_from
and lang_to
, these are strings.
Hello! Here is the working Colab
https://colab.research.google.com/drive/1_ics0YzWg5qIZIPhA1X_Wbfg0XZzRO-p
Please, try it with your texts. Let me know in case of further errors.
Hello @averkij
I am facing the same issue: TypeError: split_by_sentences_wrapper() got an unexpected keyword argument 'leave_marks'
In the following code I deliberatly left out the parameter "leave_marks" from the splitted_from and splitted_to variables because the source text is already kind of preformatted. Could you please help me out? Thanks
import os from lingtrain_aligner import preprocessor, splitter, aligner, resolver, reader, vis_helper
text1_input = "HarryPotterSteinDerWeise.rtf" text2_input = "HarryPotterandthe Philosopher.rtf"
with open(text1_input, "r", encoding="utf8") as input1: text1 = input1.readlines()
with open(text2_input, "r", encoding="utf8") as input2: text2 = input2.readlines()
db_path = "book.db"
lang_from = "de" lang_to = "en"
models = ["sentence_transformer_multilingual", "sentence_transformer_multilingual_labse"] model_name = models[0]
text1_prepared = preprocessor.mark_paragraphs(text1) text2_prepared = preprocessor.mark_paragraphs(text2)
splitted_from = splitter.split_by_sentences_wrapper(text1_prepared , lang_from) splitted_to = splitter.split_by_sentences_wrapper(text2_prepared , lang_to)
if os.path.isfile(db_path): os.unlink(db_path)
aligner.fill_db(db_path, splitted_from, splitted_to)
when I use “splitted_from = splitter.split_by_sentences_wrapper(text1_prepared, lang_from)” return list,
But I see that there will be a conflict when insert sqlite ,specific error:
File "ling_test.py", line 36, in
aligner.fill_db(db_path, splitted_from, splitted_to)
File "lingtrain_aligner/aligner.py", line 498, in fill_db
db.executemany("insert into languages(key, val) values(?,?)", [("from", lang_from), ("to", lang_to)])
sqlite3.InterfaceError: Error binding parameter 1 - probably unsupported type.