A error when I use “splitter.split_by_sentences_wrapper”，please help check the error

Amen-bang commented 2 years ago

when I use “splitted_from = splitter.split_by_sentences_wrapper(text1_prepared, lang_from)” return list，

But I see that there will be a conflict when insert sqlite ，specific error：

File "ling_test.py", line 36, in aligner.fill_db(db_path, splitted_from, splitted_to) File "lingtrain_aligner/aligner.py", line 498, in fill_db db.executemany("insert into languages(key, val) values(?,?)", [("from", lang_from), ("to", lang_to)]) sqlite3.InterfaceError: Error binding parameter 1 - probably unsupported type.

averkij commented 2 years ago

Hi!

Please, provide the text you trying to split and the lang_code.

freetz13 commented 2 years ago

Please, provide the text you trying to split and the lang_code.

I got the same error. I followed this article, so there is no lang_code, there are just:

lang_from = "ru"
lang_to = "en"

freetz13 commented 2 years ago

aligner.fill_db(db_path, splitted_from, splitted_to)

.fill_db() has a signature (db_path, lang_from, lang_to, splitted_from=[], splitted_to=[], proxy_from=[], proxy_to=[]), so set lang_from and lang_to, these are strings.

averkij commented 2 years ago

Hello! Here is the working Colab

https://colab.research.google.com/drive/1_ics0YzWg5qIZIPhA1X_Wbfg0XZzRO-p

Please, try it with your texts. Let me know in case of further errors.

francescofeston commented 1 year ago

Hello @averkij

I am facing the same issue: TypeError: split_by_sentences_wrapper() got an unexpected keyword argument 'leave_marks'

In the following code I deliberatly left out the parameter "leave_marks" from the splitted_from and splitted_to variables because the source text is already kind of preformatted. Could you please help me out? Thanks

import os from lingtrain_aligner import preprocessor, splitter, aligner, resolver, reader, vis_helper

text1_input = "HarryPotterSteinDerWeise.rtf" text2_input = "HarryPotterandthe Philosopher.rtf"

with open(text1_input, "r", encoding="utf8") as input1: text1 = input1.readlines()

with open(text2_input, "r", encoding="utf8") as input2: text2 = input2.readlines()

db_path = "book.db"

lang_from = "de" lang_to = "en"

models = ["sentence_transformer_multilingual", "sentence_transformer_multilingual_labse"] model_name = models[0]

text1_prepared = preprocessor.mark_paragraphs(text1) text2_prepared = preprocessor.mark_paragraphs(text2)

splitted_from = splitter.split_by_sentences_wrapper(text1_prepared , lang_from) splitted_to = splitter.split_by_sentences_wrapper(text2_prepared , lang_to)

if os.path.isfile(db_path): os.unlink(db_path)

aligner.fill_db(db_path, splitted_from, splitted_to)

averkij / lingtrain-aligner

A error when I use “splitter.split_by_sentences_wrapper”，please help check the error #7