StatCan / openmpp

Implementing the OpenM++ microsimulation framework as a Kubernetes service on the StatCan cloud.
0 stars 1 forks source link

Bug: Error Parsing MPI Job Manifest #56

Closed KrisWilliamson closed 9 months ago

KrisWilliamson commented 9 months ago

Describe the bug

The dispatchMPIJob.sh and parseCommand.py script seem have an issue with one -OpenM.NotOnRoot argument. It is being put into the unrecognizedCmdLineOptions because it is missing a value. According to the OpenM wiki, this argument should have a true or false value, but appears to be truncated somewhere before it is passed into these scripts. The unrecognisedCmdLineOptions is created by the parseCommand.py file, but the argument list is passed though the dispatchMPIJobs.sh script without modification.

cat inputArguments 
./bin/parseCommand.py -modelName IDMM -exeStem IDMM -dir /home/jovyan/buckets/aaw-unclassified/microsim/models/bin -binDir . -dbPath /home/jovyan/buckets/aaw-unclassified/microsim/models/bin/IDMM.sqlite -mpiNp 2 -hostFile -OpenM.RunStamp 2024_01_25_18_49_53_199 -OpenM.LogToConsole true -OpenM.LogToFile false -OpenM.MessageLanguage en-US -OpenM.RunName IDMM_Default_2024_01_25_13_49_44_969 -OpenM.SetName Default -OpenM.SubValues 2 -OpenM.LogRank true -OpenM.NotOnRoot 

Environment info

Namespace: N/A

Notebook/server: N/A

Steps to reproduce

Found while testing model isolation.

Expected behaviour

The -OpenM.NotOnRoot argument should have a value, either true or false.

Screenshots

If applicable, add screenshots to help explain your problem.

Additional context

The most likely scenario is that something is truncating the argument lists between the UI and Jacek's scripts. It is unknown how often this occurs and if this argument actually does anything to affect the run. It should be noted that this argument was at the end of the argument list, so it is possible that other arguments are also being truncated.

KrisWilliamson commented 9 months ago

At Jacek's suggestion, I turned on the Cluster Run Options --> Use MPI Root for Modeling before: ./bin/parseCommand.py -modelName IDMM -exeStem IDMM -dir /home/jovyan/buckets/aaw-unclassified/microsim/models/bin -binDir . -dbPath /home/jovyan/buckets/aaw-unclassified/microsim/models/bin/IDMM.sqlite -mpiNp 2 -hostFile -OpenM.RunStamp 2024_01_29_14_32_20_370 -OpenM.LogToConsole true -OpenM.LogToFile false -OpenM.SetName Default -OpenM.SubValues 2 -OpenM.LogRank true -OpenM.MessageLanguage en-US -OpenM.RunName IDMM_Default_2024_01_29_09_32_10_354 -OpenM.NotOnRoot

After: ./bin/parseCommand.py -modelName IDMM -exeStem IDMM -dir /home/jovyan/buckets/aaw-unclassified/microsim/models/bin -binDir . -dbPath /home/jovyan/buckets/aaw-unclassified/microsim/models/bin/IDMM.sqlite -mpiNp 1 -hostFile -OpenM.RunStamp 2024_01_29_15_37_56_606 -OpenM.LogToConsole true -OpenM.LogToFile false -OpenM.RunName IDMM_Default_2024_01_29_10_37_07_284 -OpenM.SetName Default -OpenM.LogRank true -OpenM.MessageLanguage en-US

The mpiRunName and mpiNp # are expected differences. What is surprising is the NotOnRoot does not appear at all, but the MessageLanguage en-US does. It should also be noted that the argument lists are approximately the same length

KrisWilliamson commented 9 months ago

Some more tests

IDMM no options changed:
0  29 450 inputArguments
./bin/parseCommand.py -modelName IDMM -exeStem IDMM -dir /home/jovyan/buckets/aaw-unclassified/microsim/models/bin -binDir . -dbPath /home/jovyan/buckets/aaw-unclassified/microsim/models/bin/IDMM.sqlite -mpiNp 1 -hostFile -OpenM.RunStamp 2024_01_30_14_04_27_111 -OpenM.LogToConsole true -OpenM.LogToFile false -OpenM.LogRank true -OpenM.MessageLanguage en-US -OpenM.RunName IDMM_Default_2024_01_30_09_03_38_718 -OpenM.SetName Default -OpenM.NotOnRoot
./bin/parseCommand.py 
    -modelName IDMM 
    -exeStem IDMM 
    -dir /home/jovyan/buckets/aaw-unclassified/microsim/models/bin 
    -binDir . 
    -dbPath /home/jovyan/buckets/aaw-unclassified/microsim/models/bin/IDMM.sqlite 
    -mpiNp 1 
    -hostFile 
    -OpenM.RunStamp 2024_01_30_14_04_27_111 
    -OpenM.LogToConsole true 
    -OpenM.LogToFile false 
    -OpenM.LogRank true 
    -OpenM.MessageLanguage en-US 
    -OpenM.RunName IDMM_Default_2024_01_30_09_03_38_718 
    -OpenM.SetName Default 
    -OpenM.NotOnRoot

IDMM Use MPI Root
 0  28 433 inputArguments
./bin/parseCommand.py -modelName IDMM -exeStem IDMM -dir /home/jovyan/buckets/aaw-unclassified/microsim/models/bin -binDir . -dbPath /home/jovyan/buckets/aaw-unclassified/microsim/models/bin/IDMM.sqlite -mpiNp 1 -hostFile -OpenM.RunStamp 2024_01_30_14_07_27_299 -OpenM.LogToConsole true -OpenM.LogToFile false -OpenM.RunName IDMM_Default_2024_01_30_09_06_53_553 -OpenM.SetName Default -OpenM.LogRank true -OpenM.MessageLanguage en-US
./bin/parseCommand.py 
    -modelName IDMM 
    -exeStem IDMM 
    -dir /home/jovyan/buckets/aaw-unclassified/microsim/models/bin 
    -binDir . 
    -dbPath /home/jovyan/buckets/aaw-unclassified/microsim/models/bin/IDMM.sqlite 
    -mpiNp 1 
    -hostFile 
    -OpenM.RunStamp 2024_01_30_14_07_27_299 
    -OpenM.LogToConsole true 
    -OpenM.LogToFile false 
    -OpenM.RunName IDMM_Default_2024_01_30_09_06_53_553 
    -OpenM.SetName Default 
    -OpenM.LogRank true 
    -OpenM.MessageLanguage en-US

IDMM 2/3 tables Log Progress Percent
  0  36 551 inputArguments
./bin/parseCommand.py -modelName IDMM -exeStem IDMM -dir /home/jovyan/buckets/aaw-unclassified/microsim/models/bin -binDir . -dbPath /home/jovyan/buckets/aaw-unclassified/microsim/models/bin/IDMM.sqlite -mpiNp 1 -hostFile -OpenM.RunStamp 2024_01_30_14_16_34_733 -OpenM.LogToConsole true -OpenM.LogToFile false -OpenM.LogRank true -OpenM.MessageLanguage en-US -OpenM.RunName IDMM_Default_2024_01_30_09_14_47_489 -OpenM.SetName Default -OpenM.SubValues 2 -OpenM.ProgressPercent 5 -OpenM.SparseOutput true -ini ../../logs/2024_01_30_14_16_34_733.IDMM.ini
./bin/parseCommand.py 
    -modelName IDMM 
    -exeStem IDMM 
    -dir /home/jovyan/buckets/aaw-unclassified/microsim/models/bin 
    -binDir . 
    -dbPath /home/jovyan/buckets/aaw-unclassified/microsim/models/bin/IDMM.sqlite 
    -mpiNp 1 
    -hostFile 
    -OpenM.RunStamp 2024_01_30_14_16_34_733 
    -OpenM.LogToConsole true 
    -OpenM.LogToFile false 
    -OpenM.LogRank true 
    -OpenM.MessageLanguage en-US 
    -OpenM.RunName IDMM_Default_2024_01_30_09_14_47_489 
    -OpenM.SetName Default 
    -OpenM.SubValues 2 
    -OpenM.ProgressPercent 5 
    -OpenM.SparseOutput true 
    -ini ../../logs/2024_01_30_14_16_34_733.IDMM.ini
KrisWilliamson commented 9 months ago

After some analysis, it does not seem that there is truncation of the argument list, rather it appears to be a default setting that the -OpenM.NotOnRoot does not have an argument, possibly an error in the OpenM code. If the Use MPI Root for Modeling option is turned on, then this argument does not appear in the args list at all.