VFPX / ParallelFox

Parallel Processing Library for Visual FoxPro
33 stars 7 forks source link

Concurrency issue when creating worker objects #11

Closed jmorris-spm closed 9 months ago

jmorris-spm commented 9 months ago

We have an automation program written several years ago that verifies the base data of all "parts" defined in our ERP system. As our business has grown, the execution time has grown as more and more parts have been defined. (Currently over 17000 parts being checked, with more being added every day.)

The program does an initial ODBC download from the cloud of about 900 MB, saves the data to a local database of 31 tables, then performs about 60 individual checks and verifications for each individual part. The current production version, running in a single instance of VFP takes approximately 100 minutes to process all of the data.

I have been converting it to use ParallelFox, and getting great results! I really like the product!

Using 9 workers, processing time was reduced to 14 minutes, and using 24 workers, the time was reduced to under 8 minutes. Very impressive!

Now for my question. There is a lot of potential for concurrency issues in this process during the initial environment setup for each worker, so I am using the "CriticalSection - stagger code" method you describe in your white paper, which works great.

The problem I encountered was temp file collisions when instantiating the work objects in the workers: Local lcScript TEXT TO lcScript NOSHOW TEXTMERGE PUBLIC Worker as Worker Worker = NewObject("Worker", "ParallelFox.vcx") ENDTEXT *execute script on all workers Parallel.ExecScript(lcScript,.T.) Parallel.Wait()

Approximately 50% of the test runs I did with the above code would result in some of the workers failing due to "File is in use" errors, with this sort of error:

Error: 108 Message: File is in use by another user. Method: C:\USERS\JAMES\APPDATA\LOCAL\TEMP\0002Z6L50004.FXP Line: 2 Code: Worker = NewObject("Worker", "ParallelFox.vcx" ====== Call Stack ====== 4 0002z6l50004 0002z6l50004.tmp Line 2 3 workermgr.execscript parallelfox.vct Line 18 2 workermgr.processcommand parallelfox.vct Line 21 1 tmrcommand.timer parallelfox.vct Line 19

Apparently multiple workers were generating the same random temp file names. It felt like I had a "chicken or egg" problem, as I could not use critical sections until the worker objects were created, but I also could not create the worker objects due to concurrency issues. After scratching my head over the weekend, I landed on this fix to provide random staggering between the object creations: TEXT TO lcScript NOSHOW TEXTMERGE lnRandom = RAND(_VFP.ThreadId) lnPause = lnRandom 20 INKEY(lnPause) PUBLIC Worker as Worker Worker = NewObject("Worker", "ParallelFox.vcx") ENDTEXT execute script on all workers Parallel.ExecScript(lcScript,.T.) Parallel.Wait()

This uses the unique _VFP.ThreadId to generate unique pause lengths before creating the worker objects.
It works, and perfomed successfully launching 24 workers on a very fast 32 core machine that was running absolutely nothing else at the time, then proceeded to complete the entire process in under 8 minutes. Great success!

However, I cannot help but feel there may be a better way to handle this situation that I may have missed in your documentation or videos.

Thanks for any insights you may provide.

James Morris

jmorris-spm commented 9 months ago

Additional info: The program is running from a compiled .exe. The ParallelFox class library is included in the exe.

JoelLeach commented 9 months ago

Hi James,

When using ExecScript(), VFP internally compiles the script and creates a temporary FXP file. Apparently, the temp file name is subject to naming collisions when called from multiple workers. I don't recall if I've run into this particular issue before, but I don't think there is a way for you to specify a different temp filename. I would recommend replacing the script with a procedure/function. If it is getting called often, it should improve performance as well.

jmorris-spm commented 9 months ago

Hi Joel, Thanks, that was an easy fix! Moved it to a procedure and no more problem with the concurrency. And yes, performance improved. Eliminating my kludgy delay code dropped almost a full minute from the overall program execution. It only runs every four hours, but it still helps.