RotherOSS / otobo

OTOBO is one of the most flexible web-based ticketing systems used for Customer Service, Help Desk, IT Service Management. https://otobo.io/
GNU General Public License v3.0
263 stars 75 forks source link

Add console commands for Export, Import. Including chunking. Improve Ticket import. #3682

Closed bschmalhofer closed 2 months ago

bschmalhofer commented 3 months ago

ImportExport is integrated in OTOBO 11.0 and OTOBO 11.1. But there are no console commands that perform Import and Export. So let's provide them. The new commands will be almost exact copies of the commands in ITSMConfigurationManagement, https://github.com/RotherOSS/ITSMConfigurationManagement/tree/master/Kernel/System/Console/Command/Admin/ITSM/ImportExport .

Support for chunking will be added to the commands and to the Translations object backed. Chunking the translations is basically only meant as an example, as it is unlikely that translation will exhaust the machines memory.

Do not add the commands in rel-11_0, as patch releases should have no new features.

bschmalhofer commented 3 months ago

Tentatively added the commands.

TODO:

Additional fixes to Import Export:

bschmalhofer commented 3 months ago

The current state is available at https://github.com/RotherOSS/otobo/tree/issue-%233682-importexport_with_chunking.

bschmalhofer commented 3 months ago

Another thing to discuss:

The new files Kernel/System/Console/Command/Admin/ImportExport/Import.pm and Kernel/System/Console/Command/Admin/ImportExport/Export.pm are planned to be added to OTOBO 11.1. For OTOBO 11.0 the plan is to backport these files to the ImportExportTicket package.

It can be considered to integrated the two files from ImportExportTicket into OTOBO 11.1.

<Filelist>
        <File Permission="660" Location="Kernel/Config/Files/XML/ImportExportTicket.xml" />
        <File Permission="660" Location="Kernel/System/ImportExport/ObjectBackend/Ticket.pm" />
</Filelist>

This would allow to mark ImportExportTicket simply as integrated, without having to consider the two different cases.

bschmalhofer commented 3 months ago

Discussed this with @svenoe .

bschmalhofer commented 3 months ago

Manual testing is underway. But I think that I have misused the tool. I'm wondering whether a simple ETL-Workflow should work.

  1. Export Tickets with TicketID and a dynamic field
  2. Edit the dynamic field in the CSV file
  3. Import again
  4. Expect that the dynamic field is changed

Let's try that simple flow first. More complex ImportExport just brought a lot of warnings.

bschmalhofer commented 3 months ago

Carried out the simple ETL test. This worked after activating "Empty fields indicate that the current values are kept".

Did the same test with 7589 tickets in 76 chunks of 100 tickets. This looked fine as well, after a couple of fixes.

bschmalhofer commented 3 months ago

I tried to update the subject of articles by marking the mapping Article_ArticleID as an identifier. However, the article subject was not updated. Looks like this is not implemented. See https://github.com/RotherOSS/otobo/blob/cd16817710a34480d6b0fb3b998eab3dc798efd2/Kernel/System/ImportExport/ObjectBackend/Ticket.pm#L1941 .

bschmalhofer commented 2 months ago

The chunking is not as stable as it should be. When importing two chunks of 1000 tickets, each ticket having 4 articles, I got the following error in var/log/Daemon/SchedulerTaskWorkerERR.log :

DBD::mysql::db do failed: Deadlock found when trying to get lock; try restarting transaction at /opt/otobo/Kernel/System/DB.pm line 586. 2 ERROR: OTOBO-otobo.Daemon.pl - Daemon Kernel::System::Daemon::DaemonModules::SchedulerTaskWorker-10 Perl: 5.40.0 OS: linux Time: Wed Aug 28 15:12:56 2024 3 4 Message: Deadlock found when trying to get lock; try restarting transaction, SQL: ' 5 UPDATE scheduler_task 6 SET lock_key = ?, lock_time = '2024-08-28 15:12:56', lock_update_time = '2024-08-28 15:12:56' 7 WHERE lock_key = 0 AND id = ?' 8 9 Traceback (8472): 10 Module: Kernel::System::Daemon::SchedulerDB::TaskLock Line: 506 11 Module: Kernel::System::Daemon::DaemonModules::SchedulerTaskWorker::Run Line: 180 12 Module: (eval) Line: 387 13 Module: main::Start Line: 387 14 Module: bin/otobo.Daemon.pl Line: 165 15

Importing 10 chunks of 1000 tickets failed on the first try. For some reason no new tickets were created. Howerver this was while running under Devel;;NYTProf.

bschmalhofer commented 2 months ago

Another finding from a run where 10x1000 tickets were imported. The import into the database took 1rom 15:28 to 15:54 UTC. But then the process did not finishe till XX:XX UTC. My guess is that the eventhandlers were still running. It feels like events were piling up for all 10 thousand tickets. Two things can be improved there:

Also, in the event handling phase:

bin/otobo.Console.pl Admin::ImportExport::Import --template-number 1 --prefix 20240828a Export_tickets_20240828a 2>&1 | tee import.out DBD::mysql::db do failed: Deadlock found when trying to get lock; try restarting transaction at /opt/otobo/Kernel/System/DB.pm line 586. ERROR: OTOBO-otobo.Console.pl-Admin::ImportExport::Import-10 Perl: 5.40.0 OS: linux Time: Wed Aug 28 16:05:28 2024

Message: Deadlock found when trying to get lock; try restarting transaction, SQL: ' UPDATE scheduler_task SET lock_key = 0 WHERE lock_key = 1 AND id = ?'

Traceback (4360): Module: Kernel::System::Daemon::SchedulerDB::TaskAdd Line: 183 Module: Kernel::System::Scheduler::TaskAdd Line: 98 Module: Kernel::System::AsynchronousExecutor::AsyncCall Line: 142 Module: Kernel::System::Ticket::Event::TicketAppointments::Run Line: 80 Module: (eval) Line: 241 Module: Kernel::System::EventHandler::EventHandler Line: 235 Module: Kernel::System::EventHandler::EventHandlerTransaction Line: 308 Module: Kernel::System::ObjectManager::ObjectEventsHandle Line: 459 Module: Kernel::System::ObjectManager::ObjectsDiscard Line: 507 Module: Kernel::System::ObjectManager::DESTROY Line: 677 Module: (eval) Line: 40 Module: bin/otobo.Console.pl Line: 40

bschmalhofer commented 2 months ago

Activated the handling of transaction per ticket and ran time -p bin/otobo.Console.pl Admin::ImportExport::Import --template-number 1 --prefix 20240828a Export_tickets_20240828a 2>&1

The result was:

Reading file Export_tickets_20240828a/20240828a000008.csv.

Import of 5000 Ticket records:

Success: 5000 succeeded Error: 0 failed. Import of 5000 Ticket records: 1000 Created Import of 5000 Ticket records: 4000 Updated Import complete.

Reading file Export_tickets_20240828a/20240828a000009.csv. Segmentation fault (core dumped) real 2468.73 user 1311.83 sys 275.48 otobo@ba4f7f35a106:~$

This is strange. I tested importing only the chunks 9 and 10. This worked fine. Testing all 10 chunks, I again a core dump after importing the complete chunk 9. Importing chunk 10 by itself worked again.

bschmalhofer commented 2 months ago

Pushing the current state. TODO:

bschmalhofer commented 2 months ago

The adaptions were up to now done for OTOBO 11.1.x. But that is currently not very useful as this version is not released yet. So let's backport it to OTOBO r11.0.x. As no new features should be added in a patch level release, lets include it into the master branch of https://github.com/RotherOSS/ImportExportTicket.

bschmalhofer commented 2 months ago

Some more analysis about the core dump. The suspicion is that the stack size or 8 MB is running out. This could be due to XS-code. Two candidates are Text::CSV_XS and the database driver.

It does not look like the reading of the CSV file is the problem. The 209661 lines, corresponding to 50000 entities, of the test iinput can be read in without problems. It is just not clear why the whole file must be read in before parsing begins.

It also looks like the chunking is not really the problem, when importing the combined file the core dump happens after importing 8735 tickets. That is the same ballpark as with chunked import.

Another wild guess. Kernel::System::DB::Connect() creates an anonymous sub in every call. This sub is usually not needed and it is presumable cleaned up, because it is stored in a lexical variable. But maybe something fishy is going on there.

bschmalhofer commented 2 months ago

The wild idea migth have had an effect. With a small code change in Kernel::System::DB::Connect() the sample file with 10000 tickest could be imported. The stack size grew slowly.

$ pidstat -s -l -C perl
19:44:32     1000    475242    1256    1252  perl bin/otobo.Console.pl Admin::ImportExport::Import --template-number 1 Export_tickets_20270828a.csv

But after the last imported ticket there was still a core dump. I suspect that Kernel::System::Ticket::Article might still have hoarded transaction events.

TODO:

bschmalhofer commented 2 months ago

Another experiment. First I rolled back the changes in Kernel::System::DB::Connect() ran the case with 10000 tickets again and watched the stack size watch -n 0.2 pidstat -H -s -l -p 15340.

All 10.000 tickets were imported. During the imports the stack size grew in the same way as without the adapted DB module.

Alle 2,0s: pidstat -H -s -l -p 15340 bernhard-Aspire-A515-57: Sat Aug 31 10:30:03 2024

Linux 6.8.0-40-generic (bernhard-Aspire-A515-57) 31.08.2024 _x8664 (12 CPU)

1725093003 UID PID StkSize StkRef Command 1725093003 1000 15340 1172 1172 perl bin/otobo.Console.pl Admin::ImportExport::Import --template-number 1 Export_tickets_20280828a.csv

As the end I briefly say a stack size of over 6000 kB and core was dumped. So this is the first real indication that the stack size limit is reached, presumable during global destruction. The ulimit for stacksize, unlimit -s, is 8192 kB. So core is dumped, process is terminated, when that limit is reached.

bschmalhofer commented 2 months ago

Activated the early triggering of the article transactions. Article transaction are handled just before the next article is created. Some observations:

  1. Ticket creation become a factor of about 5 slower
  2. Used stack size now grows faster during ticket and article creation
  3. Core is dumped after creation of a bit over 7000 tickets, after 8192 is reached

TODO:

bschmalhofer commented 2 months ago

More experiments.

Removing all event handlers still let the stack size grow. So the problem is likely not in one of the event handlers.

Disabling Kernel::System::EventHandler::EventHandlerTransaction() did the trick.

diff --git a/Kernel/System/EventHandler.pm b/Kernel/System/EventHandler.pm
index c6b66c80f9..fdfc98b0bf 100644
--- a/Kernel/System/EventHandler.pm
+++ b/Kernel/System/EventHandler.pm
@@ -276,6 +276,8 @@ Kernel::System::EventHandler, like this:
 sub EventHandlerTransaction {
     my ( $Self, %Param ) = @_;

+    return 1;
+
     # remember, we are in destroy mode, do not execute new events
     $Self->{EventHandlerTransaction} = 1;

The stack size is no longer growing. So disabling EventHandlerTransaction() is recommended for the upcoming test migration.

The reason why the stack size increases is not obvious. I likely has to do with the localized object manager which is destroyed and EventHandlerTransaction() is called again in the destructor.

TODO:

bschmalhofer commented 2 months ago

Turns out that this is apparently caused by binmode which is called in the constructor of Kernel::System::Encode. Does likely not affect the web app, as binmode is only called when an interactive tty is attached. test_stack_size.pl.txt

The fix is to avoid recreating Kernel::System::Encode.

TODO:

bschmalhofer commented 2 months ago

The core dump issue looks like it is solved. Got about 300 kB Stacksize after importing 10000 tickets via the console command.

bschmalhofer commented 2 months ago

TODO:

bschmalhofer commented 2 months ago

Harald Jörg kindly analyzed the behavior of binmode, http://www.perl-community.de/bat/poard/thread/21619#ms_196692 . I/O work in Perl in layers, so every call to binmode adds a layer. This is more or less stated in https://perldoc.perl.org/PerlIO . So this is not a bug in Perl, but a consequence of the design.

bschmalhofer commented 2 months ago

ImportExportTicket has been integrated into OTOBO 11.1. Closing this issue.