Helsinki-NLP / OPUS-CAT

OPUS-CAT is a collection of software which make it possible to OPUS-MT neural machine translation models in professional translation. OPUS-CAT includes a local offline MT engine and a collection of CAT tool plugins.
MIT License
71 stars 11 forks source link

Unable to fine-tune in linux #103

Open Sunlightshadow opened 3 months ago

Sunlightshadow commented 3 months ago

Hi and first of all I am gradeful for publishing a Linux version of Opus-Cat ❤️. I am using version 1.3.0.0 of Opus-Cat on Fedora 40. I can download and use models with Opus-Cat. However, I am not able to do the fine tuning. According to the log, there seems to be a wrong specification for the path of Marian. It tries to access /home/user/marian-dev/src/common/cli_wrapper.cpp:208, but the location does not exist. In addition, the app cannot access a specific library although I have set the permissions.

Here is the log:

2024-08-24 21:54:58.254 +02:00 [INF] Opening OPUS-CAT MT Engine window 2024-08-24 21:54:58.313 +02:00 [INF] Starting OPUS-CAT MT Engine 2024-08-24 21:54:58.517 +02:00 [INF] Started HTTP API at http://+:8500. This API can be accessed from remote computers, if the firewall has been configured to allow it. 2024-08-24 21:55:20.888 +02:00 [INF] Fine-tuning a new model with model tag xx_ from base model opus-2021-02-22. 2024-08-24 21:56:03.039 +02:00 [INF] Starting batch translator for model eng-deu_opus-2021-02-22. 2024-08-24 21:56:03.044 +02:00 [INF] [2024-08-24 21:56:03] Error: Cannot convert values for the option: log 2024-08-24 21:56:03.044 +02:00 [INF] [2024-08-24 21:56:03] Error: Aborted from void marian::cli::CLIWrapper::updateConfig(const YAML::Node&, marian::cli::OptionPriority, const string&) in /home/user/marian-dev/src/common/cli_wrapper.cpp:208 2024-08-24 21:56:03.044 +02:00 [INF] 2024-08-24 21:56:03.044 +02:00 [INF] [CALL STACK] 2024-08-24 21:56:03.044 +02:00 [INF] [0x555885e7a71d] + 0x20d71d 2024-08-24 21:56:03.044 +02:00 [INF] [0x555885eb1dcf] + 0x244dcf 2024-08-24 21:56:03.044 +02:00 [INF] [0x555885e98204] + 0x22b204 2024-08-24 21:56:03.044 +02:00 [INF] [0x555885d76561] + 0x109561 2024-08-24 21:56:03.044 +02:00 [INF] [0x555885d4c215] + 0xdf215 2024-08-24 21:56:03.044 +02:00 [INF] [0x7f19c5239088] + 0x2a088 2024-08-24 21:56:03.044 +02:00 [INF] [0x7f19c523914b] __libc_start_main + 0x8b 2024-08-24 21:56:03.044 +02:00 [INF] [0x555885d6f805] + 0x102805 2024-08-24 21:56:03.044 +02:00 [INF] 2024-08-24 21:56:03.105 +02:00 [INF] Batch translation process for model eng-deu_opus-2021-02-22 exited. Processing output. 2024-08-24 21:56:03.106 +02:00 [INF] python3-linux-3.8.13-x86_64/bin/python3: error while loading shared libraries: libpython3.8.so.1.0: cannot open shared object file: No such file or directory

I am happy to help you test the program. Thank you very much 🙏!

Edit:

I was able to fix the python permission error. In OpusCatMtEngine.sh you have to change LD_LIBRARY_PATH=$LD_LIBRARY_PATH./python3-linux-3.8.13-x86_64/lib/ to LD_LIBRARY_PATH=$LD_LIBRARY_PATH:./python3-linux-3.8.13-x86_64/lib/ But the Marian-related bugs still persists:

2024-08-25 05:08:03.198 +02:00 [INF] Opening OPUS-CAT MT Engine window 2024-08-25 05:08:03.252 +02:00 [INF] Starting OPUS-CAT MT Engine 2024-08-25 05:08:03.452 +02:00 [INF] Started HTTP API at http://+:8500. This API can be accessed from remote computers, if the firewall has been configured to allow it. 2024-08-25 05:08:43.864 +02:00 [INF] Fine-tuning a new model with model tag xx from base model opus-2021-02-22. 2024-08-25 05:09:25.590 +02:00 [INF] Starting batch translator for model eng-deu_opus-2021-02-22. 2024-08-25 05:09:25.602 +02:00 [INF] [2024-08-25 05:09:25] Error: Cannot convert values for the option: log 2024-08-25 05:09:25.602 +02:00 [INF] [2024-08-25 05:09:25] Error: Aborted from void marian::cli::CLIWrapper::updateConfig(const YAML::Node&, marian::cli::OptionPriority, const string&) in /home/user/marian-dev/src/common/cli_wrapper.cpp:208 2024-08-25 05:09:25.602 +02:00 [INF] 2024-08-25 05:09:25.602 +02:00 [INF] [CALL STACK] 2024-08-25 05:09:25.602 +02:00 [INF] [0x563e7924371d] + 0x20d71d 2024-08-25 05:09:25.602 +02:00 [INF] [0x563e7927adcf] + 0x244dcf 2024-08-25 05:09:25.602 +02:00 [INF] [0x563e79261204] + 0x22b204 2024-08-25 05:09:25.602 +02:00 [INF] [0x563e7913f561] + 0x109561 2024-08-25 05:09:25.602 +02:00 [INF] [0x563e79115215] + 0xdf215 2024-08-25 05:09:25.602 +02:00 [INF] [0x7f7a22727088] + 0x2a088 2024-08-25 05:09:25.602 +02:00 [INF] [0x7f7a2272714b] __libc_start_main + 0x8b 2024-08-25 05:09:25.602 +02:00 [INF] [0x563e79138805] + 0x102805 2024-08-25 05:09:25.602 +02:00 [INF] 2024-08-25 05:09:25.670 +02:00 [INF] Batch translation process for model eng-deu_opus-2021-02-22 exited. Processing output. 2024-08-25 05:09:25.783 +02:00 [INF] Traceback (most recent call last): 2024-08-25 05:09:25.783 +02:00 [INF] File "./Marian/validate.py", line 54, in 2024-08-25 05:09:25.783 +02:00 [INF] system_ood_sents, system_indomain_sents = extract_lines_and_split(system_output_path,system_seg_method) 2024-08-25 05:09:25.783 +02:00 [INF] File "./Marian/validate.py", line 32, in extract_lines_and_split 2024-08-25 05:09:25.783 +02:00 [INF] with open(sent_file_path,'rt', encoding='utf-8') as sent_file: 2024-08-25 05:09:25.783 +02:00 [INF] FileNotFoundError: [Errno 2] No such file or directory: '/home/xxx/Programme/OpusCatMTEngine_v1.3.0_linux-x64/opuscat/models/eng-deu/opus-2021-02-22_xx/valid.0.txt'

Edit 2: I am able to fine tune a model when using the "Test!Do not use verison! from the releases page, with my change from above in OpusCatMtEngine.sh! :)

TommiNieminen commented 3 months ago

Hi, thanks for your report and testing.

I think the main error from which everything else followed was probably this: Cannot convert values for the option: log

In the directory of the model that is being fine-tuned (you can access it with the Open model button) there is a file called batch.yml that contains the config for batch translation with Marian (edit: the batch.yml file is in the base model directory). That file has a value log, which should contain the path to a log file that is written when batch translating. That value is corrupt for some reason. Does your user name contain spaces by any chance, that's a common reason for path problems?

If you got it working with the test version, I must have fixed the log problem at some point, but the library problem might still occur. I'll release a new version shortly, with some bug fixes, that might solve the problem.

TommiNieminen commented 3 months ago

I've now release a new version of the cross-platform MT engine: https://github.com/Helsinki-NLP/OPUS-CAT/releases/tag/engine_v1.3.1beta

I'd be interested in knowing if you encounter any of the above problems with this version. I've tested that the Linux version works with both WSL in Windows (on two separate machines), and also on a fresh Ubuntu virtual machine. However, there still might be system-specific problems.

Sunlightshadow commented 3 months ago

Greetings :)

Does your user name contain spaces by any chance, that's a common reason for path problems?

I don't think so. It's suni so it should be no problem from there. When I fine-tuned I had the following error once but was able to restart it. Here is the part from the model log:

[2024-08-25 09:32:51] Ep. 1 : Up. 400 : Sen. 12,297 : Cost 1.27919793 4,907 after 185,532 : Time 6.48s : 756.87 words/s [2024-08-25 09:32:59] Translating validation set... [2024-08-25 09:32:59] Error: Segmentation fault [2024-08-25 09:32:59] Error: Aborted from setErrorHandlers()::<lambda(int, siginfo_t, void*)> in /home/user/marian-dev/src/common/logging.cpp:134 [2024-08-25 09:33:28] [marian] Marian v1.9.56 2be8344f 2023-12-19 18:39:32 +0000 [2024-08-25 09:33:28] [marian] Running on suni-pc as process 160797 with command line: [2024-08-25 09:33:28] [marian] Marian/marian --config /home/suni/.local/share/opuscat/models/eng-deu/opus-2021-02-22_eso/customize.yml --log-level=info [2024-08-25 09:33:28] [config] after: 0e [2024-08-25 09:33:28] [config] after-batches: 0 [2024-08-25 09:33:28] [config] after-epochs: 1 [2024-08-25 09:33:28] [config] all-caps-every: 0

I also found some issues running the gui. When I press the button "Open model directory" or any other button that should open my file browser the program crashes because it expects gnome file manager nautilus which I haven't installed because I use KDE Plasma which uses dolphin as file manager.

Unhandled exception. System.ComponentModel.Win32Exception (2): An error occurred trying to start process 'nautilus' with working directory '/home/suni/Programme/Opus-CatTest'. No such file or directory at System.Diagnostics.Process.ForkAndExecProcess(ProcessStartInfo startInfo, String resolvedFilename, String[] argv, String[] envp, String cwd, Boolean setCredentials, UInt32 userId, UInt32 groupId, UInt32[] groups, Int32& stdinFd, Int32& stdoutFd, Int32& stderrFd, Boolean usesTerminal, Boolean throwOnNoExec) at System.Diagnostics.Process.StartCore(ProcessStartInfo startInfo) at System.Diagnostics.Process.Start(ProcessStartInfo startInfo) at OpusCatMtEngine.LocalModelListView.btnOpenModelDir_Click(Object sender, RoutedEventArgs se) in D:\Users\niemi\source\repos\OPUS-CAT\AvaloniaApplication1\UI\LocalModelListView.axaml.cs:line 39 at Avalonia.Interactivity.EventRoute.RaiseEventImpl(RoutedEventArgs e) at Avalonia.Interactivity.Interactive.RaiseEvent(RoutedEventArgs e) at Avalonia.Controls.Button.OnClick() at Avalonia.Controls.Button.OnPointerReleased(PointerReleasedEventArgs e) at Avalonia.Reactive.LightweightObservableBase1.PublishNext(T value) at Avalonia.Interactivity.EventRoute.RaiseEventImpl(RoutedEventArgs e) at Avalonia.Interactivity.Interactive.RaiseEvent(RoutedEventArgs e) at Avalonia.Input.MouseDevice.MouseUp(IMouseDevice device, UInt64 timestamp, IInputRoot root, Point p, PointerPointProperties props, KeyModifiers inputModifiers, IInputElement hitTest) at Avalonia.Input.MouseDevice.ProcessRawEvent(RawPointerEventArgs e) at Avalonia.Threading.Dispatcher.Send(SendOrPostCallback action, Object arg, Nullable1 priority) at Avalonia.Controls.TopLevel.HandleInput(RawInputEventArgs e) at Avalonia.ManualRawEventGrouperDispatchQueue.DispatchNext() at Avalonia.X11.X11PlatformThreading.RunLoop(CancellationToken cancellationToken) at Avalonia.Threading.DispatcherFrame.Run(IControlledDispatcherImpl impl) at Avalonia.Threading.Dispatcher.PushFrame(DispatcherFrame frame) at Avalonia.Threading.Dispatcher.MainLoop(CancellationToken cancellationToken) at Avalonia.Controls.ApplicationLifetimes.ClassicDesktopStyleApplicationLifetime.Start(String[] args) at Avalonia.ClassicDesktopStyleApplicationLifetimeExtensions.StartWithClassicDesktopLifetime(AppBuilder builder, String[] args, Action`1 lifetimeBuilder) at OpusCatMtEngine.Program.Main(String[] args) in D:\Users\niemi\source\repos\OPUS-CAT\AvaloniaApplication1\Program.cs:line 12

I will download the new release and test it an fine tune again and see what I will find.

Edit:

Hi, the fine tuning of the model went perfectly. This time I had no errors and also the machine translations of the model are much better than the other test version. Good work. However, the problem with the file manager still exists. The text file in the settings is opened without any problems.

Thank you very much for your work! Sag Let me know if you want me to test anything else for you in linux, I'll let you know if there are any problems.

Edit 2: I noticed a strange behaviour of the models when translating. If I prefer a model in the priority, I can call it up once or twice, then the checkmark disappears from the checkbox. Opus-Cat only uses the other model which I have downloaded. Now that I have deleted the downloaded model, the other model is accepted without any problems.