Open Spaskich opened 2 years ago
Hi @Spaskich ,
Thank you for helping us improve NLPCube. The feedback is really detailed and useful.
Training a 3.0 model for Ukrainian is going to be straight forward, so I'm going to start with that. For the other issues, I will have to run a lot of local tests, in order to see what is causing them.
This is going to take some time. I will keep you updated.
@dumitrescustefan - can you please help with this?
@Spaskich - just a quick update. I didn't have time to look into the issue this week, but I will have some time starting tomorrow.
Hi @Spaskich,
Sorry for the late reply. I just finished uploading the Ukrainian model. I will issue an update for the package regarding the SpaceAfter=no bug, which we're still trying to fix. The other issues will require more works but hopefully we will be able to focus on them soon.
Thanks for the update and all the work.
Hi, I'm writing in this issue, because it's a temporary workaround while the new version is being fixed. I was trying to run 2 new cubes - Persian and Japanese, but I got the following error:
File "webserver.py", line 124, in <module>
lang2cube[lang].load(lang)
File "/work/NLP-Cube/cube/../cube/api.py", line 66, in load
model_folder_path = model_store_object.find(lang_code=language_code, version=version, verbose=self._verbose)
File "/work/NLP-Cube/cube/../cube/io_utils/model_store.py", line 192, in find
raise Exception("No model version for language ["+lang_code+"] was found in the online repository!")
Exception: No model version for language [ja] was found in the online repository!
I tried running a new instance of the English cube as well, but it returned the same error. I noticed that this url, which, as far as I understand, is the cube repository, returns a 503 error. Is this a known issue?
Hi @Spaskich ,
The issue with the older models is resolved now. We are also retraining the tokenizer for the new models, which should solve most of the problems. Thank you for your patience and for supporting this project.
Hey, are there any updates on the new models?
Hi @Spaskich . Unfortunately, we don't have any updates, because we are running a little short on man power. If don't know when we will be able to focus on this issue. However, we welcome any contribution to NLP-Cube and if you have the time and resources, maybe you could try training some of the models, until you get satisfactory results. We would be more than happy to help you package the models and assign your contribution for citing, in case people use these languages.
Okay, thanks for the info. Will update the issue if I make any progress.
Describe the bug I've been using the 3.0 version of NLP-Cube for a wide array of languages and I've encountered some minor issues. I'll summarize them below.
Additional context
SpaceAfter=No
is missing and has been replaced by a_
. Can this functionality be restored?This syntax is different than the old model. Is this a sought-after effect?
Slovenian: Text: Obveznosti za izplačila plač in prispevkov so se povečale za 11,5 odstotka na 1,21 milijarde evrov. To povišanje je posledica napredovanj in dogovora o plačah, višjega izplačanega regresa, sprostitve izplačil delovne uspešnosti ter dodatkov za delo v rizičnih razmerah. Za 13,2 odstotka so bili v primerjavi s prvimi devetimi meseci lani višji izdatki za blago in storitve, medtem ko je bilo za poplačilo obresti izplačanih 6,7 odstotka manj denarja kot lani v tem času. Nižji izdatki iz tega naslova so posledica operacij državne zakladnice z upravljanjem javnega dolga, pravijo na ministrstvu.
New model doesn't split the sentences.