Closed janamach closed 3 years ago
This is a weird way to manage priorities.. Usually HPCs will limit the number of running jobs, or resources used, and not the number of queued jobs..
Anyhow, you can run a single step of the solve by using --step 1 (or 2 or 3). Note that you have to wait for one step to fully finish before running the next one.
Another way is to run by ‘graph’ number. A ‘graph’ in this context is a group of videos that are solved together. If you divided your experiment into few such graphs, you can use the --glist option, which accept an integer enumeration of these graphs. The default is to group by video subdirectories, but you can configure it as you like.
On 6 Apr 2021, at 14:56, Jana Mach @.***> wrote:
The HPC server I am using has certain limits per user (100 schedules jobs, 3 days max per job). The solve step in my case generated more than 100 jobs, causing some of the jobs getting cancelled. Since the solve step is a three step process, I figured I can start each step manually, e.g.:
The HPC I am using is very easy to get access to, maybe its primary purpose is training new users. I asked if they can increase my queued job quota.
Anyhow, you can run a single step of the solve by using --step 1 (or 2 or 3). Note that you have to wait for one step to fully finish before running the next one.
I tried that, it somehow didn't work:
(antrax) [fr_jm1121@uc2n994 ~]$ antrax solve H1CN0304/ --hpc --step 1 --hpc-options partition=single,email=janajg@gmail.com,cpus=4,mem-per-cpu=4000,time=24:00:00
==================================================================================
Welcome to anTraX - a software for tracking color tagged ants (and other insects)
==================================================================================
Jobfile created in H1CN0304/antrax/logs/hpc_solve1.sh
Job number 19452619 was submitted
Jobfile created in H1CN0304/antrax/logs/hpc_solve2.sh
Job number 19452620 was submitted
Jobfile created in H1CN0304/antrax/logs/hpc_solve3.sh
sbatch: error: AssocMaxSubmitJobLimit
sbatch: error: Batch job submission failed: Job violates accounting/QOS policy (job submit limit, user's size and/or time limits)
Traceback (most recent call last):
File "/home/fr/fr_fr/fr_jm1121/anaconda3/envs/antrax/bin/antrax", line 8, in <module>
sys.exit(main())
File "/home/fr/fr_fr/fr_jm1121/anaconda3/envs/antrax/lib/python3.6/site-packages/antrax/cli.py", line 651, in main
""")
File "/home/fr/fr_fr/fr_jm1121/anaconda3/envs/antrax/lib/python3.6/site-packages/sigtools/modifiers.py", line 158, in __call__
return self.func(*args, **kwargs)
File "/home/fr/fr_fr/fr_jm1121/anaconda3/envs/antrax/lib/python3.6/site-packages/clize/runner.py", line 363, in run
ret = cli(*args)
File "/home/fr/fr_fr/fr_jm1121/anaconda3/envs/antrax/lib/python3.6/site-packages/clize/runner.py", line 220, in __call__
return func(*posargs, **kwargs)
File "/home/fr/fr_fr/fr_jm1121/anaconda3/envs/antrax/lib/python3.6/site-packages/clize/runner.py", line 262, in _cli
return func('{0} {1}'.format(name, command), *args)
File "/home/fr/fr_fr/fr_jm1121/anaconda3/envs/antrax/lib/python3.6/site-packages/clize/runner.py", line 220, in __call__
return func(*posargs, **kwargs)
File "/home/fr/fr_fr/fr_jm1121/anaconda3/envs/antrax/lib/python3.6/site-packages/antrax/cli.py", line 297, in solve
jid = antrax_hpc_job(e, 'solve', opts=hpc_options, solve_step=3)
File "/home/fr/fr_fr/fr_jm1121/anaconda3/envs/antrax/lib/python3.6/site-packages/antrax/hpc.py", line 258, in antrax_hpc_job
jid = submit_slurm_job_file(jobfile, waitfor=waitfor)
File "/home/fr/fr_fr/fr_jm1121/anaconda3/envs/antrax/lib/python3.6/site-packages/antrax/hpc.py", line 80, in submit_slurm_job_file
jid = out.split()[-1]
IndexError: list index out of range
If I add --dry
, I get a different error:
$ antrax solve H1CN0304/ --step 2 --hpc --dry --hpc-options partition=single,email=janajg@gmail.com,cpus=4,mem-per-cpu=4000,time=24:00:00
==================================================================================
Welcome to anTraX - a software for tracking color tagged ants (and other insects)
==================================================================================
Jobfile created in H1CN0304/antrax/logs/hpc_solve1.sh
Dry run, no job submitted.
Traceback (most recent call last):
File "/home/fr/fr_fr/fr_jm1121/anaconda3/envs/antrax/bin/antrax", line 8, in <module>
sys.exit(main())
File "/home/fr/fr_fr/fr_jm1121/anaconda3/envs/antrax/lib/python3.6/site-packages/antrax/cli.py", line 651, in main
""")
File "/home/fr/fr_fr/fr_jm1121/anaconda3/envs/antrax/lib/python3.6/site-packages/sigtools/modifiers.py", line 158, in __call__
return self.func(*args, **kwargs)
File "/home/fr/fr_fr/fr_jm1121/anaconda3/envs/antrax/lib/python3.6/site-packages/clize/runner.py", line 363, in run
ret = cli(*args)
File "/home/fr/fr_fr/fr_jm1121/anaconda3/envs/antrax/lib/python3.6/site-packages/clize/runner.py", line 220, in __call__
return func(*posargs, **kwargs)
File "/home/fr/fr_fr/fr_jm1121/anaconda3/envs/antrax/lib/python3.6/site-packages/clize/runner.py", line 262, in _cli
return func('{0} {1}'.format(name, command), *args)
File "/home/fr/fr_fr/fr_jm1121/anaconda3/envs/antrax/lib/python3.6/site-packages/clize/runner.py", line 220, in __call__
return func(*posargs, **kwargs)
File "/home/fr/fr_fr/fr_jm1121/anaconda3/envs/antrax/lib/python3.6/site-packages/antrax/cli.py", line 293, in solve
jid = antrax_hpc_job(e, 'solve', opts=hpc_options, solve_step=1)
File "/home/fr/fr_fr/fr_jm1121/anaconda3/envs/antrax/lib/python3.6/site-packages/antrax/hpc.py", line 265, in antrax_hpc_job
return jid
UnboundLocalError: local variable 'jid' referenced before assignment
But the .sh
file it generated has --step 1
in it, although I asked for --step 2
As far as I understand, sbatch path/to/hpc_solve1.sh
in this case should be equivalent to starting the jobs through antrax interface with --step 1
, is that right?
You’re right, there is a small bug in the interface (when running in hpc mode, anTraX ignores the step option).
There is also a small bug with the dry option in the solve step. The two bugs together explain why you see --step 1 in the job file.
Yes, you can submit the job file yourself, just update the step option.
On 6 Apr 2021, at 15:33, Jana Mach @.***> wrote:
The HPC I am using is very easy to get access to, maybe its primary purpose is training new users. I asked if they can increase my queued job quota.
Anyhow, you can run a single step of the solve by using --step 1 (or 2 or 3). Note that you have to wait for one step to fully finish before running the next one.
I tried that, it somehow didn't work:
(antrax) @. ~]$ antrax solve H1CN0304/ --hpc --step 1 --hpc-options @.,cpus=4,mem-per-cpu=4000,time=24:00:00
==================================================================================
Welcome to anTraX - a software for tracking color tagged ants (and other insects)
==================================================================================
Jobfile created in H1CN0304/antrax/logs/hpc_solve1.sh
Job number 19452619 was submitted
Jobfile created in H1CN0304/antrax/logs/hpc_solve2.sh
Job number 19452620 was submitted
Jobfile created in H1CN0304/antrax/logs/hpc_solve3.sh
sbatch: error: AssocMaxSubmitJobLimit sbatch: error: Batch job submission failed: Job violates accounting/QOS policy (job submit limit, user's size and/or time limits) Traceback (most recent call last): File "/home/fr/fr_fr/fr_jm1121/anaconda3/envs/antrax/bin/antrax", line 8, in
sys.exit(main()) File "/home/fr/fr_fr/fr_jm1121/anaconda3/envs/antrax/lib/python3.6/site-packages/antrax/cli.py", line 651, in main """) File "/home/fr/fr_fr/fr_jm1121/anaconda3/envs/antrax/lib/python3.6/site-packages/sigtools/modifiers.py", line 158, in call return self.func(*args, kwargs) File "/home/fr/fr_fr/fr_jm1121/anaconda3/envs/antrax/lib/python3.6/site-packages/clize/runner.py", line 363, in run ret = cli(args) File "/home/fr/fr_fr/fr_jm1121/anaconda3/envs/antrax/lib/python3.6/site-packages/clize/runner.py", line 220, in call return func(posargs, kwargs) File "/home/fr/fr_fr/fr_jm1121/anaconda3/envs/antrax/lib/python3.6/site-packages/clize/runner.py", line 262, in _cli return func('{0} {1}'.format(name, command), args) File "/home/fr/fr_fr/fr_jm1121/anaconda3/envs/antrax/lib/python3.6/site-packages/clize/runner.py", line 220, in call return func(posargs, **kwargs) File "/home/fr/fr_fr/fr_jm1121/anaconda3/envs/antrax/lib/python3.6/site-packages/antrax/cli.py", line 297, in solve jid = antrax_hpc_job(e, 'solve', opts=hpc_options, solve_step=3) File "/home/fr/fr_fr/fr_jm1121/anaconda3/envs/antrax/lib/python3.6/site-packages/antrax/hpc.py", line 258, in antrax_hpc_job jid = submit_slurm_job_file(jobfile, waitfor=waitfor) File "/home/fr/fr_fr/fr_jm1121/anaconda3/envs/antrax/lib/python3.6/site-packages/antrax/hpc.py", line 80, in submit_slurm_job_file jid = out.split()[-1] IndexError: list index out of range If I add --dry, I get a different error: $ antrax solve H1CN0304/ --step 2 --hpc --dry --hpc-options @.***,cpus=4,mem-per-cpu=4000,time=24:00:00
==================================================================================
Welcome to anTraX - a software for tracking color tagged ants (and other insects)
==================================================================================
Jobfile created in H1CN0304/antrax/logs/hpc_solve1.sh
Dry run, no job submitted.
Traceback (most recent call last): File "/home/fr/fr_fr/fr_jm1121/anaconda3/envs/antrax/bin/antrax", line 8, in
sys.exit(main()) File "/home/fr/fr_fr/fr_jm1121/anaconda3/envs/antrax/lib/python3.6/site-packages/antrax/cli.py", line 651, in main """) File "/home/fr/fr_fr/fr_jm1121/anaconda3/envs/antrax/lib/python3.6/site-packages/sigtools/modifiers.py", line 158, in call return self.func(*args, kwargs) File "/home/fr/fr_fr/fr_jm1121/anaconda3/envs/antrax/lib/python3.6/site-packages/clize/runner.py", line 363, in run ret = cli(args) File "/home/fr/fr_fr/fr_jm1121/anaconda3/envs/antrax/lib/python3.6/site-packages/clize/runner.py", line 220, in call return func(posargs, kwargs) File "/home/fr/fr_fr/fr_jm1121/anaconda3/envs/antrax/lib/python3.6/site-packages/clize/runner.py", line 262, in _cli return func('{0} {1}'.format(name, command), args) File "/home/fr/fr_fr/fr_jm1121/anaconda3/envs/antrax/lib/python3.6/site-packages/clize/runner.py", line 220, in call return func(posargs, **kwargs) File "/home/fr/fr_fr/fr_jm1121/anaconda3/envs/antrax/lib/python3.6/site-packages/antrax/cli.py", line 293, in solve jid = antrax_hpc_job(e, 'solve', opts=hpc_options, solve_step=1) File "/home/fr/fr_fr/fr_jm1121/anaconda3/envs/antrax/lib/python3.6/site-packages/antrax/hpc.py", line 265, in antrax_hpc_job return jid UnboundLocalError: local variable 'jid' referenced before assignment But the .sh file it generated has --step 1 in it, although I asked for --step 2 As far as I understand, sbatch path/to/hpc_solve1.sh in this case should be equivalent to starting the jobs through antrax interface with --step 1, is that right?
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/Social-Evolution-and-Behavior/anTraX/issues/20#issuecomment-814083037, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACERP5XNK2DQIMW7ZRZZQ4DTHL5RPANCNFSM42OSYFEQ.
One of the 60 jobs in step 1 is failing consistently, while all other 59 finished successfully. The log says:
============================= JOB FEEDBACK =============================
NodeName=uc2n405
Job ID: 19453304
Array Job ID: 19453240_50
Cluster: uc2
User/Group: fr_jm1121/fr_fr
State: FAILED (exit code 1)
Nodes: 1
Cores per node: 4
CPU Utilized: 00:00:00
CPU Efficiency: 0.00% of 00:01:04 core-walltime
Job Wall-clock time: 00:00:16
Memory Utilized: 1.02 MB
Memory Efficiency: 0.01% of 15.62 GB
What could be a possible reason? Is there a way to "rescue" this?
Can you look at the corresponding anTraX-generated logs? These will be session/logs/hpc_solve1_50.log and session/logs/matlab_solve_m_50.log
The text above is from hpc_solve1_50.log
, the corresponding matlab_solve_m_50.log
has not been generated.
While looking at the matlab_solve_m_*.log
's, I found more problems that were not reflected in hpc_solve1_*.log
. I looked for logs that did not have the word "Done" in them with:
$ grep -rHnoL "Done" matlab_solve_m*
matlab_solve_m_21.log
matlab_solve_m_25.log
matlab_solve_m_44.log
matlab_solve_m_54.log
matlab_solve_m_59.log
matlab_solve_m_60.log
All had the same UnrecognizedVarName
error:
$ cat matlab_solve_m_59.log
18:22:16 -I- Reading video information from file
18:22:20 -I- Loading trgraph from antrax/graphs/graph_59_59.mat
Error using tracklet/load_ids (line 746)
Unrecognized table variable name 'tracklet'.
Error in trgraph/load_ids (line 667)
Error in trgraph.load (line 891)
Error in trhandles/loaddata (line 607)
Error in solve_single_movie (line 52)
Error in antrax_mcr_interface (line 30)
MATLAB:table:UnrecognizedVarName
The UnrecognizedVarName error seems to be caused by the fact there are no classified tracklets in the video (check to see if antrax/labels/autoids_59.csv is indeed empty). This probably because either you didn't had any detections in those videos, or only multi-ant detections. Either way, I'll need to patch this. I guess I never tested the software with such a sparse tracking problem. You might be able to ignore this issue for now and continue to the next steps, but it also possible that the next steps will complain as well.
As for the error in video #50, I'm not sure. It seems the crash happened before matlab was even started, which is weird. Can you verify that the data files exist? These should be:
antrax/graphs/graph_50_50.mat antrax/tracklets/trdata_50_50.mat antrax/images/images_50_50.mat antrax/labels/autoids_50_50.mat
Also try to take a look in the logs of the previous steps, maybe there will be some clues there.
check to see if antrax/labels/autoids_59.csv is indeed empty
No, none of the ones that showed the UnrecognizedVarName
error are empty, they look pretty normal to me:
$ head autoids_59.csv
tracklet,label,score,best_frame
trj_id10_ti59_13365_tf59_13365,Unknown,0,0
trj_id10_ti59_13372_tf59_13372,GGY,0.9986485838890076,1
trj_id10_ti59_13373_tf59_13373,Unknown,0,0
trj_id10_ti59_13375_tf59_13375,Unknown,0,0
Can you verify that the data files exist? These should be:
antrax/graphs/graph_50_50.mat
Exists!
antrax/tracklets/trdata_50_50.mat
Did you mean trdata_50.mat
? That exists.
antrax/images/images_50_50.mat
Did you mean images_50.mat
? That exists too.
antrax/labels/autoids_50_50.mat
This doesn't exist. If you meant csv
, then there is a file for each video.
ok, weird.
This will need to be debugged on a local machine. Can you sync your data back?
Try to run solve step 1 for video 50 and see it crashes and why.
For the other error, try loading the data in an interactive matlab session with:
Trck = trhandles(uigetdir);
G = Trck.loaddata(59);
To keep it simple, I will compare 59 (that failed above) to 58 (completed successfully).
Running solve
with either MCR or MATLAB 2019a gives the MATLAB:table:UnrecognizedVarName
error in the log, but not in terminal:
$ antrax solve --step 1 --movlist 59 H1CN0304/
==================================================================================
Welcome to anTraX - a software for tracking color tagged ants (and other insects)
==================================================================================
07/04/21 16:14:39 -I- Starting 2 workers
07/04/21 16:14:39 -I- Started solve movie 59
07/04/21 16:14:39 -D- running matlab mcr
07/04/21 16:14:39 -D- command is: /home/jana/src/anTraX/bin/antrax_glnxa64_mcr_interface solve_single_movie H1CN0304/ 59 trackingdirname antrax
07/04/21 16:14:39 -D- matlab app exited with code None
07/04/21 16:15:29 -I- Finished solve movie 59
07/04/21 16:15:29 -I- Workers closed
Log with MCR:
$ cat H1CN0304/antrax/logs/matlab_solve_m_59.log
16:14:53 -D- initializing expreader object
16:14:53 -I- Reading video information from file
16:14:57 -I- Loading trgraph from antrax/graphs/graph_59_59.mat
Error using tracklet/load_ids (line 746)
Unrecognized table variable name 'tracklet'.
Error in trgraph/load_ids (line 667)
Error in trgraph.load (line 891)
Error in trhandles/loaddata (line 607)
Error in solve_single_movie (line 52)
Error in antrax_mcr_interface (line 30)
MATLAB:table:UnrecognizedVarName
Log with MATLAB:
$ cat H1CN0304/antrax/logs/matlab_solve_m_59.log
16:46:49 -D- initializing expreader object
16:46:50 -I- Reading video information from file
16:46:54 -I- Loading trgraph from antrax/graphs/graph_59_59.mat
Error using tracklet/load_ids (line 746)
Unrecognized table variable name 'tracklet'.
Error in trgraph/load_ids (line 675)
G.trjs.load_ids;
Error in trgraph.load (line 899)
G.load_ids;
Error in trhandles/loaddata (line 607)
GS = trgraph.load(Trck,movlist);
Error in solve_single_movie (line 52)
G = Trck.loaddata(m,colony);
Doing the same with 58
gives the same output in terminal, but a different looking log:
$ antrax solve --step 1 --movlist 58 H1CN0304/
==================================================================================
Welcome to anTraX - a software for tracking color tagged ants (and other insects)
==================================================================================
07/04/21 16:18:31 -I- Starting 2 workers
07/04/21 16:18:31 -I- Started solve movie 58
07/04/21 16:18:31 -D- running matlab mcr
07/04/21 16:18:31 -D- command is: /home/jana/src/anTraX/bin/antrax_glnxa64_mcr_interface solve_single_movie H1CN0304/ 58 trackingdirname antrax
07/04/21 16:18:31 -D- matlab app exited with code None
07/04/21 16:42:02 -I- Finished solve movie 58
07/04/21 16:42:02 -I- Workers closed
$ cat H1CN0304/antrax/logs/matlab_solve_m_58.log
16:18:43 -D- initializing expreader object
16:18:43 -I- Reading video information from file
16:18:47 -I- Loading trgraph from antrax/graphs/graph_58_58.mat
16:19:41 -I- Finished loading trgraph with 16476 tracklets
16:19:42 -I- Loading ids
16:19:52 -I- Finding single ant nodes
16:19:54 -I- Some preperations
16:19:56 -I- Resetting graph id assigments
16:19:56 -I- Filtering out tracklets identified as non-ant
16:19:56 -I- ...18 tracklets classified as no-ant were filtered
16:19:56 -I- ...8727 short, unconnected and unidentified tracklets were filtered
16:19:56 -I- Propagating ids from src tracklets
16:19:59 -I- ...finished 1000/3377
16:19:59 -I- ...finished 2000/3377
16:19:59 -I- ...finished 3000/3377
16:19:59 -I- Propagation loops
...
16:39:59 -I- ...working on any_ant
16:40:00 -I- ......found 288 cc's
16:40:00 -I- ......filtered 1 cc's
16:40:02 -I- ......pruned 18 nodes
16:40:02 -I- Propagation loops
16:40:03 -I- ...assigned 0 tracklets
16:40:03 -I- Biconnected components condition (positive)
16:40:09 -I- ...assigned 0 tracklets
16:40:09 -I- Assigning ids to tracklets
16:40:09 -I- Saving
16:41:56 -G- Done
For the interactive matlab session (59 vs 58):
>> G = Trck.loaddata(59);
16:20:11 -I- Loading trgraph from antrax/graphs/graph_59_59.mat
Error using tracklet/load_ids (line 746)
Unrecognized table variable name 'tracklet'.
Error in trgraph/load_ids (line 675)
G.trjs.load_ids;
Error in trgraph.load (line 899)
G.load_ids;
Error in trhandles/loaddata (line 607)
GS = trgraph.load(Trck,movlist);
>> G = Trck.loaddata(58);
16:23:17 -I- Loading trgraph from antrax/graphs/graph_58_58.mat
16:24:21 -I- Finished loading trgraph with 16476 tracklets
In the matlab command line, try loading the problematic autoids file and display the generated table:
f = 'antrax/labels/autoids_59_59.csv';
T = readtable(f);
head(T)
Also, run locally solve on video 50, which had a different issue.
Hmmmm....
>> f = 'antrax/labels/autoids_59.csv';
>> T = readtable(f);
>> head(T)
ans =
8×6 table
Var1 Var2 Var3 Var4 Var5 Var6
_____ ______ ______ _____ ______ ________________________________
'trj' 'id10' 'ti59' 13365 'tf59' '13365,Unknown,0,0'
'trj' 'id10' 'ti59' 13372 'tf59' '13372,GGY,0.9986485838890076,1'
'trj' 'id10' 'ti59' 13373 'tf59' '13373,Unknown,0,0'
'trj' 'id10' 'ti59' 13375 'tf59' '13375,Unknown,0,0'
'trj' 'id10' 'ti59' 13381 'tf59' '13381,Unknown,0,0'
'trj' 'id10' 'ti59' 13385 'tf59' '13385,Unknown,0,0'
'trj' 'id10' 'ti59' 13391 'tf59' '13391,Unknown,0,0'
'trj' 'id10' 'ti59' 13393 'tf59' '13393,Unknown,0,0'
58 looks different:
>> f = 'antrax/labels/autoids_58.csv';
>> T = readtable(f);
>> head(T)
ans =
8×4 table
tracklet label score best_frame
________________________________ _________ _______ __________
'trj_id10_ti58_10117_tf58_10117' 'GGY' 0.99987 1
'trj_id10_ti58_1139_tf58_1139' 'Unknown' 0 0
'trj_id10_ti58_1364_tf58_1364' 'Unknown' 0 0
'trj_id10_ti58_1372_tf58_1372' 'Unknown' 0 0
'trj_id10_ti58_1389_tf58_1389' 'Unknown' 0 0
'trj_id10_ti58_1395_tf58_1395' 'GGY' 0.99884 1
'trj_id10_ti58_1401_tf58_1401' 'Unknown' 0 0
'trj_id10_ti58_1405_tf58_1405' 'GGY' 0.99956 1
Looks like underscores were turned into commas in 59... In bash these two files look very similar:
$ head autoids_59.csv
tracklet,label,score,best_frame
trj_id10_ti59_13365_tf59_13365,Unknown,0,0
trj_id10_ti59_13372_tf59_13372,GGY,0.9986485838890076,1
trj_id10_ti59_13373_tf59_13373,Unknown,0,0
trj_id10_ti59_13375_tf59_13375,Unknown,0,0
trj_id10_ti59_13381_tf59_13381,Unknown,0,0
trj_id10_ti59_13385_tf59_13385,Unknown,0,0
trj_id10_ti59_13391_tf59_13391,Unknown,0,0
trj_id10_ti59_13393_tf59_13393,Unknown,0,0
trj_id10_ti59_13396_tf59_13396,Unknown,0,0
$ head autoids_58.csv
tracklet,label,score,best_frame
trj_id10_ti58_10117_tf58_10117,GGY,0.9998655319213867,1
trj_id10_ti58_1139_tf58_1139,Unknown,0,0
trj_id10_ti58_1364_tf58_1364,Unknown,0,0
trj_id10_ti58_1372_tf58_1372,Unknown,0,0
trj_id10_ti58_1389_tf58_1389,Unknown,0,0
trj_id10_ti58_1395_tf58_1395,GGY,0.9988380074501038,1
trj_id10_ti58_1401_tf58_1401,Unknown,0,0
trj_id10_ti58_1405_tf58_1405,GGY,0.9995608925819397,1
trj_id10_ti58_1409_tf58_1409,Unknown,0,0
Also, run locally solve on video 50, which had a different issue.
Running. This one should take longer.
That's odd. Try giving an explicit delimiter:
f = 'antrax/labels/autoids_59_59.csv';
T = readtable(f, 'Delimiter', ',');
head(T)
Forcing it worked:
>> f = 'antrax/labels/autoids_59.csv';
>> T = readtable(f);
>> head(T)
ans =
8x6 table
Var1 Var2 Var3 Var4 Var5 Var6
_____ ______ ______ _____ ______ ________________________________
'trj' 'id10' 'ti59' 13365 'tf59' '13365,Unknown,0,0'
'trj' 'id10' 'ti59' 13372 'tf59' '13372,GGY,0.9986485838890076,1'
'trj' 'id10' 'ti59' 13373 'tf59' '13373,Unknown,0,0'
'trj' 'id10' 'ti59' 13375 'tf59' '13375,Unknown,0,0'
'trj' 'id10' 'ti59' 13381 'tf59' '13381,Unknown,0,0'
'trj' 'id10' 'ti59' 13385 'tf59' '13385,Unknown,0,0'
'trj' 'id10' 'ti59' 13391 'tf59' '13391,Unknown,0,0'
'trj' 'id10' 'ti59' 13393 'tf59' '13393,Unknown,0,0'
>> T = readtable(f, 'Delimiter', ',');
>> head(T)
ans =
8x4 table
tracklet label score best_frame
________________________________ _________ _______ __________
'trj_id10_ti59_13365_tf59_13365' 'Unknown' 0 0
'trj_id10_ti59_13372_tf59_13372' 'GGY' 0.99865 1
'trj_id10_ti59_13373_tf59_13373' 'Unknown' 0 0
'trj_id10_ti59_13375_tf59_13375' 'Unknown' 0 0
'trj_id10_ti59_13381_tf59_13381' 'Unknown' 0 0
'trj_id10_ti59_13385_tf59_13385' 'Unknown' 0 0
'trj_id10_ti59_13391_tf59_13391' 'Unknown' 0 0
'trj_id10_ti59_13393_tf59_13393' 'Unknown' 0 0
I have no explanation to this behavior...
Anyhow, I tried to patch the issue on debug-jana branch, see if it works. It also fixes the other small issues we had in this thread and the previous... I haven't tested it, so issues might pop up.
You are very efficient, thank you!
The readtable thing worked locally with $ antrax solve H1CN0304/ --step 1 --movlist 59
:
Before pull:
$ cat matlab_solve_m_59.log
08:56:02 -D- initializing expreader object
08:56:02 -I- Reading video information from file
08:56:06 -I- Loading trgraph from antrax/graphs/graph_59_59.mat
Error using tracklet/load_ids (line 746)
Unrecognized table variable name 'tracklet'.
Error in trgraph/load_ids (line 667)
Error in trgraph.load (line 891)
Error in trhandles/loaddata (line 607)
Error in solve_single_movie (line 52)
Error in antrax_mcr_interface (line 30)
MATLAB:table:UnrecognizedVarName
After pull:
$ head matlab_solve_m_59.log
08:57:32 -D- initializing expreader object
08:57:32 -I- Reading video information from file
08:57:36 -I- Loading trgraph from antrax/graphs/graph_59_59.mat
08:58:02 -I- Finished loading trgraph with 9369 tracklets
08:58:03 -I- Loading ids
08:58:06 -I- Finding single ant nodes
08:58:07 -I- Some preperations
08:58:08 -I- Looking for bottleneck pairs
08:58:09 -I- done distance mat
09:00:59 -I- Resetting graph id assigments
$ tail matlab_solve_m_59.log
09:14:30 -I- ......found 359 cc's
09:14:30 -I- ......filtered 0 cc's
09:14:32 -I- ......pruned 0 nodes
09:14:32 -I- Propagation loops
09:14:32 -I- ...assigned 0 tracklets
09:14:32 -I- Biconnected components condition (positive)
09:14:35 -I- ...assigned 0 tracklets
09:14:35 -I- Assigning ids to tracklets
09:14:35 -I- Saving
09:15:33 -G- Done
There's another twist: I ran the solve
step on a local computer with MATLAB and all files (including 50
) were processed successfully and the xy
csv files were generated for each video. It took it more than a day to finish, I saw the result just now.
I am now processing another experiment on the HPC starting with tracking. I got to the solve step yesterday, but it failed as multiple jobs ran into the readtable weirdness. I will let you know how it goes :-)
Looks like https://github.com/Social-Evolution-and-Behavior/anTraX/commit/3ce63fdedd34dcc62adea5aafdc6329f370e026a worked: I ran solve
for 90 videos and none of them ran into that strange readtable problem in step 1. The last one, 90, showed MATLAB:badsubscript
as it barely had any tracklets, I hope it doesn't affect the further steps.
Commit https://github.com/Social-Evolution-and-Behavior/anTraX/commit/5f0cb61fdc27a470b7885a4c8b6364dee013b79b didn't seem to help though, the step option is still being ignored:
$ antrax solve CN0402/ --step 3 --hpc --hpc-options partition=single,cpus=2,mem-per-cpu=2000,time=72:00:00
==================================================================================
Welcome to anTraX - a software for tracking color tagged ants (and other insects)
==================================================================================
Jobfile created in CN0402/antrax/logs/hpc_solve1.sh
Job number 19458033 was submitted
Jobfile created in CN0402/antrax/logs/hpc_solve2.sh
Job number 19458034 was submitted
Jobfile created in CN0402/antrax/logs/hpc_solve3.sh
sbatch: error: AssocMaxSubmitJobLimit
sbatch: error: Batch job submission failed: Job violates accounting/QOS policy (job submit limit, user's size and/or time limits)
Traceback (most recent call last):
File "/home/fr/fr_fr/fr_jm1121/anaconda3/envs/antrax/bin/antrax", line 8, in <module>
sys.exit(main())
File "/home/fr/fr_fr/fr_jm1121/anaconda3/envs/antrax/lib/python3.6/site-packages/antrax/cli.py", line 651, in main
""")
File "/home/fr/fr_fr/fr_jm1121/anaconda3/envs/antrax/lib/python3.6/site-packages/sigtools/modifiers.py", line 158, in __call__
return self.func(*args, **kwargs)
File "/home/fr/fr_fr/fr_jm1121/anaconda3/envs/antrax/lib/python3.6/site-packages/clize/runner.py", line 363, in run
ret = cli(*args)
File "/home/fr/fr_fr/fr_jm1121/anaconda3/envs/antrax/lib/python3.6/site-packages/clize/runner.py", line 220, in __call__
return func(*posargs, **kwargs)
File "/home/fr/fr_fr/fr_jm1121/anaconda3/envs/antrax/lib/python3.6/site-packages/clize/runner.py", line 262, in _cli
return func('{0} {1}'.format(name, command), *args)
File "/home/fr/fr_fr/fr_jm1121/anaconda3/envs/antrax/lib/python3.6/site-packages/clize/runner.py", line 220, in __call__
return func(*posargs, **kwargs)
File "/home/fr/fr_fr/fr_jm1121/anaconda3/envs/antrax/lib/python3.6/site-packages/antrax/cli.py", line 297, in solve
jid = antrax_hpc_job(e, 'solve', opts=hpc_options, solve_step=3)
File "/home/fr/fr_fr/fr_jm1121/anaconda3/envs/antrax/lib/python3.6/site-packages/antrax/hpc.py", line 258, in antrax_hpc_job
jid = submit_slurm_job_file(jobfile, waitfor=waitfor)
File "/home/fr/fr_fr/fr_jm1121/anaconda3/envs/antrax/lib/python3.6/site-packages/antrax/hpc.py", line 80, in submit_slurm_job_file
jid = out.split()[-1]
IndexError: list index out of range
Also with --dry
:
$ antrax solve CN0402/ --step 2 --dry --hpc --hpc-options partition=single,cpus=2,mem-per-cpu=2000,time=72:00:00
==================================================================================
Welcome to anTraX - a software for tracking color tagged ants (and other insects)
==================================================================================
Jobfile created in CN0402/antrax/logs/hpc_solve1.sh
Dry run, no job submitted.
Traceback (most recent call last):
File "/home/fr/fr_fr/fr_jm1121/anaconda3/envs/antrax/bin/antrax", line 8, in <module>
sys.exit(main())
File "/home/fr/fr_fr/fr_jm1121/anaconda3/envs/antrax/lib/python3.6/site-packages/antrax/cli.py", line 651, in main
""")
File "/home/fr/fr_fr/fr_jm1121/anaconda3/envs/antrax/lib/python3.6/site-packages/sigtools/modifiers.py", line 158, in __call__
return self.func(*args, **kwargs)
File "/home/fr/fr_fr/fr_jm1121/anaconda3/envs/antrax/lib/python3.6/site-packages/clize/runner.py", line 363, in run
ret = cli(*args)
File "/home/fr/fr_fr/fr_jm1121/anaconda3/envs/antrax/lib/python3.6/site-packages/clize/runner.py", line 220, in __call__
return func(*posargs, **kwargs)
File "/home/fr/fr_fr/fr_jm1121/anaconda3/envs/antrax/lib/python3.6/site-packages/clize/runner.py", line 262, in _cli
return func('{0} {1}'.format(name, command), *args)
File "/home/fr/fr_fr/fr_jm1121/anaconda3/envs/antrax/lib/python3.6/site-packages/clize/runner.py", line 220, in __call__
return func(*posargs, **kwargs)
File "/home/fr/fr_fr/fr_jm1121/anaconda3/envs/antrax/lib/python3.6/site-packages/antrax/cli.py", line 293, in solve
jid = antrax_hpc_job(e, 'solve', opts=hpc_options, solve_step=1)
File "/home/fr/fr_fr/fr_jm1121/anaconda3/envs/antrax/lib/python3.6/site-packages/antrax/hpc.py", line 265, in antrax_hpc_job
return jid
UnboundLocalError: local variable 'jid' referenced before assignment
I fixed the dry run issue.
As for the single step run - can you verify that you are on the debug branch on the HPC? If you indeed are, can you paste here the "solve" function in the cli.py file?
Thank you for fixing all these things, I just finished processing the new experiment with 90 videos, I did not run into any serious errors and the files in antdata
were generated.
For the single step issue:
$ git branch
* debug-jana
master
$ less antrax/cli.py
def solve(explist, *, glist: parse_movlist=None, movlist: parse_movlist=None, clist: parse_movlist=None, mcr=False,
nw=2, hpc=False, hpc_options: parse_hpc_options={}, missing=False, session=None, dry=False, step=0):
"""Run propagation step"""
explist = parse_explist(explist, session)
mcr = mcr or ANTRAX_USE_MCR
hpc = hpc or ANTRAX_HPC
if hpc:
for e in explist:
eglist = glist if glist is not None else e.glist
emlist = [e.ggroups[g - 1] for g in eglist]
emlist = [m for grp in emlist for m in grp]
hpc_options['dry'] = dry
hpc_options['classifier'] = classifier
hpc_options['missing'] = missing
hpc_options['glist'] = eglist
hpc_options['movlist'] = emlist
if e.prmtrs['geometry_multi_colony']:
eclist = clist if clist is not None else e.clist
for c in eclist:
hpc_options['c'] = c
hpc_options['waitfor'] = None
if step == 0 or step == 1:
jid = antrax_hpc_job(e, 'solve', opts=hpc_options, solve_step=1)
hpc_options['waitfor'] = jid
if step == 0 or step == 2:
jid = antrax_hpc_job(e, 'solve', opts=hpc_options, solve_step=2)
hpc_options['waitfor'] = jid
if step == 0 or step == 3:
jid = antrax_hpc_job(e, 'solve', opts=hpc_options, solve_step=3)
else:
hpc_options['c'] = None
hpc_options['waitfor'] = None
if step == 0 or step == 1:
jid = antrax_hpc_job(e, 'solve', opts=hpc_options, solve_step=1)
hpc_options['waitfor'] = jid
if step == 0 or step == 2:
jid = antrax_hpc_job(e, 'solve', opts=hpc_options, solve_step=2)
hpc_options['waitfor'] = jid
if step == 0 or step == 3:
jid = antrax_hpc_job(e, 'solve', opts=hpc_options, solve_step=3)
else:
Q = MatlabQueue(nw=nw, mcr=mcr)
for e in explist:
eglist = glist if glist is not None else e.glist
eclist = clist if clist is not None else e.clist
emlist = [e.ggroups[g - 1] for g in eglist]
emlist = [m for grp in emlist for m in grp]
if movlist is not None:
emlist = [m for m in emlist if m in movlist]
if step == 0 or step == 1:
if e.prmtrs['geometry_multi_colony']:
for c in eclist:
for m in emlist:
w = {'fun': 'solve_single_movie'}
w['args'] = [e.expdir, m, 'trackingdirname', e.session, 'colony', c]
w['diary'] = join(e.logsdir, 'matlab_solve_m_' + str(m) + '_c_' + str(c) + '.log')
w['str'] = 'solve colony ' + str(c) + ' movie ' + str(m)
Q.put(w)
else:
for m in emlist:
w = {'fun': 'solve_single_movie'}
w['args'] = [e.expdir, m, 'trackingdirname', e.session]
w['diary'] = join(e.logsdir, 'matlab_solve_m_' + str(m) + '.log')
w['str'] = 'solve movie ' + str(m)
Q.put(w)
# wait for single movie tasks to complete
Q.join()
# stitch
if step == 0 or step == 2:
if e.prmtrs['geometry_multi_colony']:
for c in eclist:
for g in eglist:
w = {'fun': 'solve_across_movies'}
w['args'] = [e.expdir, g, 'trackingdirname', e.session, 'colony', c]
w['diary'] = join(e.logsdir, 'matlab_solve_g_' + str(g) + '_c_' + str(c) + '.log')
w['str'] = 'solve stitch colony ' + str(c) + ' graph ' + str(g)
Q.put(w)
else:
for g in eglist:
w = {'fun': 'solve_across_movies'}
w['args'] = [e.expdir, g, 'trackingdirname', e.session]
w['diary'] = join(e.logsdir, 'matlab_solve_g_' + str(g) + '.log')
w['str'] = 'solve stitch graph ' + str(g)
Q.put(w)
# wait for stitch to finish
Q.join()
if step == 0 or step == 3:
if e.prmtrs['geometry_multi_colony']:
for c in eclist:
for m in emlist:
w = {'fun': 'export_single_movie'}
w['args'] = [e.expdir, m, 'trackingdirname', e.session, 'colony', c]
w['diary'] = join(e.logsdir, 'matlab_export_m_' + str(m) + '_c_' + str(c) + '.log')
w['str'] = 'export colony ' + str(c) + ' movie ' + str(m)
Q.put(w)
else:
for m in emlist:
w = {'fun': 'export_single_movie'}
w['args'] = [e.expdir, m, 'trackingdirname', e.session]
w['diary'] = join(e.logsdir, 'matlab_export_m_' + str(m) + '.log')
w['str'] = 'export movie ' + str(m)
Q.put(w)
# wait for stitch to finish
Q.join()
# close
Q.stop_workers()
P.S. All this was now done on HPC
Unfortunately, there are more issues with that dataset despite it completeling what seemed successfully.
MATLAB:badsubscript
error:$ grep -rHno "MATLAB:badsubscript" matlab_export_m_* | sort
matlab_export_m_16.log:11:MATLAB:badsubscript
matlab_export_m_30.log:11:MATLAB:badsubscript
matlab_export_m_45.log:11:MATLAB:badsubscript
matlab_export_m_48.log:11:MATLAB:badsubscript
matlab_export_m_49.log:11:MATLAB:badsubscript
matlab_export_m_52.log:11:MATLAB:badsubscript
matlab_export_m_59.log:11:MATLAB:badsubscript
matlab_export_m_62.log:11:MATLAB:badsubscript
matlab_export_m_64.log:11:MATLAB:badsubscript
matlab_export_m_65.log:11:MATLAB:badsubscript
matlab_export_m_66.log:11:MATLAB:badsubscript
matlab_export_m_68.log:11:MATLAB:badsubscript
matlab_export_m_70.log:11:MATLAB:badsubscript
matlab_export_m_71.log:11:MATLAB:badsubscript
matlab_export_m_72.log:11:MATLAB:badsubscript
matlab_export_m_78.log:11:MATLAB:badsubscript
matlab_export_m_80.log:11:MATLAB:badsubscript
matlab_export_m_81.log:11:MATLAB:badsubscript
matlab_export_m_82.log:11:MATLAB:badsubscript
matlab_export_m_83.log:11:MATLAB:badsubscript
matlab_export_m_85.log:11:MATLAB:badsubscript
matlab_export_m_90.log:11:MATLAB:badsubscript
$ for i in {1..90}; do if [ -f ../antdata/xy_${i}_${i}.csv ]; then : ; else echo "Missing: ${i}" ; fi; done
Missing: 16
Missing: 30
Missing: 45
Missing: 48
Missing: 49
Missing: 52
Missing: 59
Missing: 62
Missing: 64
Missing: 65
Missing: 66
Missing: 68
Missing: 70
Missing: 71
Missing: 72
Missing: 78
Missing: 80
Missing: 81
Missing: 82
Missing: 83
Missing: 85
Missing: 90
Maybe relatedly, MATLAB:UndefinedFunction
and MATLAB:badsubscript
were popping out throughout the whole process:
$ grep -rHno "MATLAB:UndefinedFunction" | sort
matlab_solve_m_16.log:17:MATLAB:UndefinedFunction
matlab_solve_m_45.log:17:MATLAB:UndefinedFunction
matlab_solve_m_48.log:17:MATLAB:UndefinedFunction
matlab_solve_m_49.log:17:MATLAB:UndefinedFunction
matlab_solve_m_52.log:17:MATLAB:UndefinedFunction
matlab_solve_m_59.log:17:MATLAB:UndefinedFunction
matlab_solve_m_62.log:17:MATLAB:UndefinedFunction
matlab_solve_m_65.log:17:MATLAB:UndefinedFunction
matlab_solve_m_68.log:17:MATLAB:UndefinedFunction
matlab_solve_m_70.log:17:MATLAB:UndefinedFunction
matlab_solve_m_71.log:17:MATLAB:UndefinedFunction
matlab_solve_m_78.log:17:MATLAB:UndefinedFunction
matlab_solve_m_80.log:17:MATLAB:UndefinedFunction
matlab_solve_m_81.log:17:MATLAB:UndefinedFunction
matlab_solve_m_85.log:17:MATLAB:UndefinedFunction
matlab_track_m_16.log:77:MATLAB:UndefinedFunction
matlab_track_m_45.log:77:MATLAB:UndefinedFunction
matlab_track_m_48.log:86:MATLAB:UndefinedFunction
matlab_track_m_49.log:77:MATLAB:UndefinedFunction
matlab_track_m_52.log:77:MATLAB:UndefinedFunction
matlab_track_m_59.log:77:MATLAB:UndefinedFunction
matlab_track_m_62.log:77:MATLAB:UndefinedFunction
matlab_track_m_65.log:77:MATLAB:UndefinedFunction
matlab_track_m_68.log:77:MATLAB:UndefinedFunction
matlab_track_m_70.log:77:MATLAB:UndefinedFunction
matlab_track_m_71.log:77:MATLAB:UndefinedFunction
matlab_track_m_78.log:77:MATLAB:UndefinedFunction
matlab_track_m_80.log:77:MATLAB:UndefinedFunction
matlab_track_m_81.log:77:MATLAB:UndefinedFunction
matlab_track_m_85.log:77:MATLAB:UndefinedFunction
$ grep -rHno "MATLAB:badsubscript" | sort
matlab_export_m_16.log:11:MATLAB:badsubscript
matlab_export_m_30.log:11:MATLAB:badsubscript
matlab_export_m_45.log:11:MATLAB:badsubscript
matlab_export_m_48.log:11:MATLAB:badsubscript
matlab_export_m_49.log:11:MATLAB:badsubscript
matlab_export_m_52.log:11:MATLAB:badsubscript
matlab_export_m_59.log:11:MATLAB:badsubscript
matlab_export_m_62.log:11:MATLAB:badsubscript
matlab_export_m_64.log:11:MATLAB:badsubscript
matlab_export_m_65.log:11:MATLAB:badsubscript
matlab_export_m_66.log:11:MATLAB:badsubscript
matlab_export_m_68.log:11:MATLAB:badsubscript
matlab_export_m_70.log:11:MATLAB:badsubscript
matlab_export_m_71.log:11:MATLAB:badsubscript
matlab_export_m_72.log:11:MATLAB:badsubscript
matlab_export_m_78.log:11:MATLAB:badsubscript
matlab_export_m_80.log:11:MATLAB:badsubscript
matlab_export_m_81.log:11:MATLAB:badsubscript
matlab_export_m_82.log:11:MATLAB:badsubscript
matlab_export_m_83.log:11:MATLAB:badsubscript
matlab_export_m_85.log:11:MATLAB:badsubscript
matlab_export_m_90.log:11:MATLAB:badsubscript
matlab_solve_g_2.log:37:MATLAB:badsubscript
matlab_solve_g_3.log:37:MATLAB:badsubscript
matlab_solve_g_4.log:37:MATLAB:badsubscript
matlab_solve_g_5.log:37:MATLAB:badsubscript
matlab_solve_m_30.log:25:MATLAB:badsubscript
matlab_solve_m_64.log:25:MATLAB:badsubscript
matlab_solve_m_66.log:25:MATLAB:badsubscript
matlab_solve_m_72.log:25:MATLAB:badsubscript
matlab_solve_m_82.log:26:MATLAB:badsubscript
matlab_solve_m_83.log:25:MATLAB:badsubscript
matlab_solve_m_90.log:25:MATLAB:badsubscript
validate
does not work on this dataset, but works on other datasets. tried it with both MCR and MATLAB and on debug-jana and master branch:$ antrax validate CN0402/
==================================================================================
Welcome to anTraX - a software for tracking color tagged ants (and other insects)
==================================================================================
11:43:15 -D- initializing expreader object
11:43:15 -I- Reading video information from file
Subscripted assignment between dissimilar structures.
Error in trhandles/loadxy (line 514)
xy(i) = load([xydir,xyfiles{i}]);
Error in validate_tracking/set_experiment (line 266)
[app.XY,frames] = app.Trck.loadxy('movlist',app.ti.m:app.tf.m,'type',app.type);
Error in validate_tracking/startupFcn (line 441)
set_experiment(app, Trck, p.Results.session)
Error in validate_tracking (line 659)
runStartupFcn(app, @(app)startupFcn(app, varargin{:}))
Traceback (most recent call last):
File "/home/jana/anaconda3/envs/antrax/bin/antrax", line 8, in <module>
sys.exit(main())
File "/home/jana/anaconda3/envs/antrax/lib/python3.6/site-packages/antrax/cli.py", line 651, in main
""")
File "/home/jana/anaconda3/envs/antrax/lib/python3.6/site-packages/sigtools/modifiers.py", line 158, in __call__
return self.func(*args, **kwargs)
File "/home/jana/anaconda3/envs/antrax/lib/python3.6/site-packages/clize/runner.py", line 363, in run
ret = cli(*args)
File "/home/jana/anaconda3/envs/antrax/lib/python3.6/site-packages/clize/runner.py", line 220, in __call__
return func(*posargs, **kwargs)
File "/home/jana/anaconda3/envs/antrax/lib/python3.6/site-packages/clize/runner.py", line 262, in _cli
return func('{0} {1}'.format(name, command), *args)
File "/home/jana/anaconda3/envs/antrax/lib/python3.6/site-packages/clize/runner.py", line 220, in __call__
return func(*posargs, **kwargs)
File "/home/jana/anaconda3/envs/antrax/lib/python3.6/site-packages/antrax/cli.py", line 149, in validate
launch_matlab_app('validate_tracking', args, mcr=mcr)
File "/home/jana/anaconda3/envs/antrax/lib/python3.6/site-packages/antrax/matlab.py", line 204, in launch_matlab_app
app = eval('eng.' + appname + '(' + ','.join([str(a) for a in args]) + ')')
File "<string>", line 1, in <module>
File "/home/jana/anaconda3/envs/antrax/lib/python3.6/site-packages/matlab/engine/matlabengine.py", line 71, in __call__
_stderr, feval=True).result()
File "/home/jana/anaconda3/envs/antrax/lib/python3.6/site-packages/matlab/engine/futureresult.py", line 67, in result
return self.__future.result(timeout)
File "/home/jana/anaconda3/envs/antrax/lib/python3.6/site-packages/matlab/engine/fevalfuture.py", line 82, in result
self._result = pythonengine.getFEvalResult(self._future,self._nargout, None, out=self._out, err=self._err)
matlab.engine.MatlabExecutionError:
File /home/jana/src/anTraX/matlab/@trhandles/trhandles.m, line 514, in trhandles.loadxy
File /home/jana/src/anTraX/matlab/apps/validate_tracking.mlapp, line 266, in validate_tracking.set_experiment
File /home/jana/src/anTraX/matlab/apps/validate_tracking.mlapp, line 441, in validate_tracking.startupFcn
File /home/jana/src/anTraX/matlab/apps/validate_tracking.mlapp, line 659, in validate_tracking.validate_tracking
Subscripted assignment between dissimilar structures.
$ antrax validate CN0402/
==================================================================================
Welcome to anTraX - a software for tracking color tagged ants (and other insects)
==================================================================================
09/04/21 11:41:37 -D- running matlab mcr
09/04/21 11:41:37 -D- command is: /home/jana/src/anTraX/bin/antrax_glnxa64_mcr_interface validate_tracking CN0402/
11:41:46 -D- initializing expreader object
11:41:46 -I- Reading video information from file
Subscripted assignment between dissimilar structures.
Error in trhandles/loadxy (line 514)
Error in validate_tracking/set_experiment (line 254)
Error in validate_tracking/startupFcn (line 429)
Error in appdesigner.internal.service.AppManagementService/tryCallback (line 336)
Error in matlab.apps.AppBase/runStartupFcn (line 41)
Error in validate_tracking (line 640)
Error in antrax_mcr_interface (line 20)
MATLAB:heterogeneousStrucAssignment
09/04/21 11:41:55 -D- matlab app exited with code 249
Maybe it's trying to load a non-existing file? The error inside of one of those logs looks like this:
$ cat matlab_export_m_70.log
09:23:13 -I- Reading video information from file
09:23:17 -I- Loading trgraph from antrax/graphs/graph_70_70.mat
09:23:18 -I- Finished loading trgraph with 200 tracklets
09:23:18 -I- Loading tracklet data for movie 70
Index in position 2 exceeds array bounds.
Error in trgraph/export_xy (line 82)
Error in export_single_movie (line 52)
Error in antrax_mcr_interface (line 42)
MATLAB:badsubscript
Loading extract-trainset
worked and it showed that most blobs were identified as RBR, which is wrong. Could that have contributed to the export error?
The validate
command fails because there is something wrong with the xy files, so let's try and figure that one first.
The extract-trainset
command shows you the results of the blob classifier, so if it is completely off, you should try and understand why. However, it should not cause any program crash downstream, just very bad tracking results.
The error in the export log suggests that the solve step fails on that video. Can you see if there is something weird in the corresponding solve
logs?
btw, the single step solve on hpc works properly for me. Did you remember to do pip install
(this needs to be done for python code changes, but not for matlab code).
Did you remember to do pip install
Oops :-(
The error in the export log suggests that the solve step fails on that video. Can you see if there is something weird in the corresponding solve logs?
The problems seem to start at the step 1
of solve. E.g.:
$ cat matlab_solve_m_16.log
07:36:14 -I- Reading video information from file
07:36:22 -I- Loading trgraph from antrax/graphs/graph_16_16.mat
07:36:23 -I- Finished loading trgraph with 166 tracklets
07:36:23 -I- Loading ids
07:36:23 -I- Finding single ant nodes
07:36:23 -I- Some preperations
07:36:23 -I- Looking for bottleneck pairs
07:36:23 -I- done distance mat
Undefined function or variable 'pairs'.
Error in trgraph/get_bottleneck_pairs (line 523)
Error in trgraph/solve (line 28)
Error in solve_single_movie (line 54)
Error in antrax_mcr_interface (line 30)
MATLAB:UndefinedFunction
$ cat matlab_solve_m_30.log
07:36:14 -I- Reading video information from file
07:36:22 -I- Loading trgraph from antrax/graphs/graph_30_30.mat
07:36:24 -I- Finished loading trgraph with 374 tracklets
07:36:24 -I- Loading ids
07:36:25 -I- Finding single ant nodes
07:36:25 -I- Some preperations
07:36:25 -I- Looking for bottleneck pairs
07:36:25 -I- done distance mat
07:36:25 -I- Resetting graph id assigments
07:36:25 -I- Filtering out tracklets identified as non-ant
07:36:25 -I- ...0 tracklets classified as no-ant were filtered
07:36:25 -I- ...7 short, unconnected and unidentified tracklets were filtered
07:36:25 -I- Propagating ids from src tracklets
07:36:26 -I- Propagation loops
07:36:26 -I- ...assigned 0 tracklets
07:36:26 -I- Biconnected components condition (positive)
Index in position 2 exceeds array bounds.
Error in trgraph/solve>propagate_all (line 536)
Error in trgraph/solve (line 150)
Error in solve_single_movie (line 54)
Error in antrax_mcr_interface (line 30)
MATLAB:badsubscript
In this case, 16
had the MATLAB:UndefinedFunction
during tracking, while 30
finished properly. The classify
step finished normally in both cases.
All files that experience MATLAB:UndefinedFunction
during track
also failed during solve
, maybe the fix in #17 will help. Other ones (like 30
, see above) had a different error during solve
-- MATLAB:badsubscript
.
Yes, all these errors seems related to the degenerated graph case. Let me know how that latest version does.
About the pip install, I like to to use pip install -e <path>
for packages under development, as it creates a link to working directory of the package instead of copying the files, so you don't need to install again for every change or branch switching.
Thank you for the pip tip, I was unaware of it :-) The solve
thing with the --step option works for me now too, thank you for fixing it!
I got to the solve
step with the problematic datasets, here's what I got:
MATLAB:badsubscript
during track
solve
step, the MATLAB:UndefinedFunction
error does not show up anymore, but MATLAB:badsubscript
did in 22 out of 90 cases. I repeated this twice, it's always the same videos. In an attempt to fix it, I tried training the classifier specifically on the images extracted from the problematic videos, but that didn't help. Those videos are not empty, btw, there are identifiable ants on them. The matlab_solve_m_*.log
typically looks like this:$ cat matlab_solve_m_52.log
21:33:26 -I- Reading video information from file
21:33:32 -I- Loading trgraph from antrax/graphs/graph_52_52.mat
21:34:03 -I- Finished loading trgraph with 11734 tracklets
21:34:04 -I- Loading ids
21:34:09 -I- Finding single ant nodes
21:34:09 -I- Some preperations
21:34:10 -I- Looking for bottleneck pairs
21:34:13 -I- done distance mat
21:34:13 -I- Resetting graph id assigments
21:34:13 -I- Filtering out tracklets identified as non-ant
21:34:13 -I- ...10530 tracklets classified as no-ant were filtered
21:34:13 -I- ...2013 short, unconnected and unidentified tracklets were filtered
21:34:14 -I- Propagating ids from src tracklets
21:34:14 -I- Propagation loops
21:34:14 -I- ...assigned 0 tracklets
21:34:14 -I- Biconnected components condition (positive)
Index in position 2 exceeds array bounds.
Error in trgraph/solve>propagate_all (line 536)
Error in trgraph/solve (line 150)
Error in solve_single_movie (line 54)
Error in antrax_mcr_interface (line 30)
MATLAB:badsubscript
This dataset has 90 videos of 40 min. I am processing another dataset that has 60 videos, one hour each, that one takes longer to process and I didn't get to the solve step yet. If that dataset gets through the solve step properly, I will re-slice the videos for this experiment. I will also run the solve step overnight with MATLAB on a local machine to see if this error only occurs with MCR.
On a local machine with MATLAB solve
failed too at the same spots. The error looks like this:
$ cat matlab_solve_m_52.log
22:31:17 -D- initializing expreader object
22:31:17 -I- Reading video information from file
22:31:19 -I- Loading trgraph from antrax/graphs/graph_52_52.mat
22:31:48 -I- Finished loading trgraph with 11734 tracklets
22:31:48 -I- Loading ids
22:31:52 -I- Finding single ant nodes
22:31:53 -I- Some preperations
22:31:53 -I- Looking for bottleneck pairs
22:31:55 -I- done distance mat
22:31:55 -I- Resetting graph id assigments
22:31:55 -I- Filtering out tracklets identified as non-ant
22:31:55 -I- ...10530 tracklets classified as no-ant were filtered
22:31:55 -I- ...2013 short, unconnected and unidentified tracklets were filtered
22:31:55 -I- Propagating ids from src tracklets
22:31:56 -I- Propagation loops
22:31:56 -I- ...assigned 0 tracklets
22:31:56 -I- Biconnected components condition (positive)
Index in position 2 exceeds array bounds.
Error in trgraph/solve>propagate_all (line 536)
G.pairs = G.pairs(argsort(G.pairs(:,3)),:);
Error in trgraph/solve (line 150)
propagate_all(G);
Error in solve_single_movie (line 54)
solve(G,false,false);
I guess the dataset is not good then?
Hi,
Is --movlist supposed to work during the solve step 1 on HPC? It seems to be ignored:
$ antrax solve H1CN0304/ --step 1 --movlist 50 --hpc --hpc-options partition=single,,cpus=3,mem-per-cpu=3000,time=72:00:00
==================================================================================
Welcome to anTraX - a software for tracking color tagged ants (and other insects)
==================================================================================
Jobfile created in H1CN0304/antrax/logs/hpc_solve1.sh
Job number 19464706 was submitted
$ squeue -l
Tue Apr 13 11:29:34 2021
JOBID PARTITION NAME USER STATE TIME TIME_LIMI NODES NODELIST(REASON)
19464706_[20-60%60 single slv1:H1C fr_jm112 PENDING 0:00 3-00:00:00 1 (Resources)
19464706_1 single slv1:H1C fr_jm112 RUNNING 0:02 3-00:00:00 1 uc2n421
19464706_2 single slv1:H1C fr_jm112 RUNNING 0:02 3-00:00:00 1 uc2n421
19464706_3 single slv1:H1C fr_jm112 RUNNING 0:02 3-00:00:00 1 uc2n370
[...]
Once again you are right - I fixed the movlist issue.
Also made a new fix to the MATLAB:badsubscript issue. Try it now... Its a program bug, not an issue with your dataset. I just need to catch all the spots that reference the problematic variable. It's hard without being able to replicate the error on my side.
Thank you for fixing these things :-) I am never really sure if I am right about anything.
Also made a new fix to the MATLAB:badsubscript issue. Try it now... Its a program bug, not an issue with your dataset.
I don't seem to be getting these error with a different dataset... Or did you fix this days ago? I changed the dataset that was causing all these problems by re-slicing the videos into 1 hour pieces. I also finally figured out that I need to use a far larger number of epochs during the training step than the default 5, in my case I need more than 20 (45 seems like a good number when running from scratch on a good set of examples) to get loss and accuracy values closer to 0.5 and 0.95 accordingly.
And what does --missing
do in the solve
context?
$ antrax solve --help
==================================================================================
Welcome to anTraX - a software for tracking color tagged ants (and other insects)
==================================================================================
Usage: antrax solve [OPTIONS] explist
Run propagation step
Arguments:
explist
Options:
--clist=PARSE_MOVLIST
--dry
--glist=PARSE_MOVLIST
--hpc
--hpc-options=PARSE_HPC_OPTIONS (default: {})
--mcr
--missing
--movlist=PARSE_MOVLIST
--nw=INT (default: 2)
--session=STR
--step=INT (default: 0)
Other actions:
-h, --help Show the help
I had some jobs fail because I did not allocate enough memory for them. And some jobs seem to fail repeatedly for no obvious reason, but that can be fixed if I remove the hpc_solve1_*.log
for that job. Weird.
Once again you are right - I fixed the movlist issue.
Works beautifully!
$ antrax solve JS16/ --step 1 --movlist 2-4 --dry --hpc --hpc-options partition=single,cpus=3,mem-per-cpu=3000,time=72:00:00
==================================================================================
Welcome to anTraX - a software for tracking color tagged ants (and other insects)
==================================================================================
Jobfile created in JS16/antrax_demo/logs/hpc_solve1.sh
Dry run, no job submitted.
$ cat JS16/antrax_demo/logs/hpc_solve1.sh
#!/bin/bash
#SBATCH --job-name=slv1:JS16
#SBATCH --output=JS16/antrax_demo/logs/hpc_solve1_%a.log
#SBATCH --partition=single
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=3
#SBATCH --time=72:00:00
#SBATCH --mem-per-cpu=3000
#SBATCH --array=2-4%3
#SBATCH --mail-type=ALL
#SBATCH --mail-user=None
srun -N1 antrax solve JS16/ --session antrax_demo --movlist $SLURM_ARRAY_TASK_ID --nw 1 --step 1 --mcr
I used pip install -e .
, very handy. Incidentally, it also doesn't prompt the strange HPC permission error I described in #13 as it did with plain pip install .
.
Using --missing
with solve will run solve on videos that do not have an xy file, which is the only output file of the step. It is useful if some jobs failed, and you want to run only those. If you don't specify the step, it will run step 1 on the missing videos, then step 2 on all graphs, and then step 3 again on the missing videos.
The MATLAB:badsubscript happens on a very specific case, where the program did not find any topologically equivalent node pairs (see the paper) in the video. I never encountered such a case in my experiments, so it is very likely that you see it only in this specific dataset. Anyhow it is a good idea to patch it, even if you found a workaround, so let me know if it happens again. The fix was in my last commit, not days ago.
Regarding the classifier - definitely! usually 50-100 epochs are needed, depending on the complexity of the problem (number of classes, image resolution, etc.). I usually recommend aiming to at least 0.95 accuracy.
I understand that you already completed tracking of a few datasets, and ran the validation procedure? What accuracy do you see?
No, I am actually slower than it may seem :-/ With small test datasets it worked out really well, but with large ones (e.g., 60 hours) I kept making different silly mistakes that hindered my progress. For example. I realized only yesterday that I need to run the training step much longer. Hopefully I will get to the point where I will run validation on one of the large experiments sometime this week.
ok, hopefully the effort will pay off!
I think --missing
might not be working... One xy file of 60 was not generated, but this restarted all jobs:
$ for i in {1..60}; do if [ -f ~/H2CN0402/antrax/antdata/xy_${i}_${i}.csv ]; then : ; else echo "Missing: ${i}" ; fi; done
Missing: 2
$ antrax solve H2CN0402/ --missing --hpc --hpc-options partition=single,cpus=2,mem-per-cpu=2000,time=72:00:00
==================================================================================
Welcome to anTraX - a software for tracking color tagged ants (and other insects)
==================================================================================
Jobfile created in H2CN0402/antrax/logs/hpc_solve1.sh
Job number 19468563 was submitted
Jobfile created in H2CN0402/antrax/logs/hpc_solve2.sh
Job number 19468564 was submitted
Jobfile created in H2CN0402/antrax/logs/hpc_solve3.sh
$ cat H2CN0402/antrax/logs/hpc_solve1.sh
#!/bin/bash
#SBATCH --job-name=slv1:H2CN0402
#SBATCH --output=H2CN0402/antrax/logs/hpc_solve1_%a.log
#SBATCH --partition=single
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=2
#SBATCH --time=72:00:00
#SBATCH --mem-per-cpu=2000
#SBATCH --array=1-60%60
#SBATCH --mail-type=ALL
srun -N1 antrax solve H2CN0402/ --session antrax --movlist $SLURM_ARRAY_TASK_ID --nw 1 --step 1 --mcr
$ cat H2CN0402/antrax/logs/hpc_solve2.sh
#!/bin/bash
#SBATCH --job-name=slv2:H2CN0402
#SBATCH --output=H2CN0402/antrax/logs/hpc_solve2_%a.log
#SBATCH --partition=single
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=2
#SBATCH --time=72:00:00
#SBATCH --mem-per-cpu=2000
#SBATCH --array=1-5%5
#SBATCH --mail-type=ALL
srun -N1 antrax solve H2CN0402/ --session antrax --movlist $SLURM_ARRAY_TASK_ID --nw 1 --step 2 --mcr
$ cat H2CN0402/antrax/logs/hpc_solve3.sh
#!/bin/bash
#SBATCH --job-name=slv3:H2CN0402
#SBATCH --output=H2CN0402/antrax/logs/hpc_solve3_%a.log
#SBATCH --partition=single
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=2
#SBATCH --time=72:00:00
#SBATCH --mem-per-cpu=2000
#SBATCH --array=1-60%60
#SBATCH --mail-type=ALL
srun -N1 antrax solve H2CN0402/ --session antrax --movlist $SLURM_ARRAY_TASK_ID --nw 1 --step 3 --mcr
The log of the missing file is complaining about a possibly corrupt MAT file. The file is physically there, what do you think could have caused the problem?
$ cat matlab_solve_m_2.log
08:07:21 -I- Reading video information from file
08:07:27 -I- Loading trgraph from antrax/graphs/graph_2_2.mat
Error using load
Unable to read MAT-file /pfs/data5/home/fr/fr_fr/fr_jm1121/H2CN0402/antrax/graphs/graph_2_2_trjs.mat. File might be corrupt.
Error in trgraph.load (line 886)
Error in trhandles/loaddata (line 607)
Error in solve_single_movie (line 52)
Error in antrax_mcr_interface (line 30)
MATLAB:load:unableToReadMatFile
Can you try and load the file in matlab using the load command? If its indeed corrupted, it's possible that something interrupted the writing of the file, so it might be just a random thing. Is track step on this video finished properly? Try re-running track for that video.
I'll take a look at the --missing
issue tomorrow.
Matlab has the same complaint:
>> addpath(genpath(['.','/matlab']));
>> load antrax/graphs/graph_2_2_trjs.mat
Error using load
Unable to read MAT-file /media/jana/HDD/bw/H2CN0402/antrax/graphs/graph_2_2_trjs.mat. File might be corrupt.
I think I know what I did wrong: I might have started the next step before the previous one finished. On the up side, it was otherwise a very smooth process, from track to solve.
No, something is still wrong. After re-slicing the videos and starting everything from scratch, I had errors during solve steps 2 and 3.
In step 2 it was either MATLAB:badsubscript or MATLAB:load:cantReadFile (?):
$ grep -rHnoL "Done" matlab_solve_g_*
matlab_solve_g_3.log
matlab_solve_g_4.log
matlab_solve_g_5.log
$ cat matlab_solve_g_3.log
00:43:23 -I- Reading video information from file
00:43:32 -I- solving graph from movies 25-36
00:43:32 -I- Loading trgraph from antrax/graphs/graph_25_25.mat
Error using load
Cannot read file /pfs/data5/home/fr/fr_fr/fr_jm1121/H2CN0402/antrax/graphs/graph_25_25.mat.
Error in trgraph.load (line 879)
Error in trhandles/loaddata (line 607)
Error in solve_across_movies (line 70)
Error in antrax_mcr_interface (line 53)
MATLAB:load:cantReadFile
$ cat matlab_solve_g_4.log
00:43:51 -I- Reading video information from file
00:43:58 -I- solving graph from movies 37-48
00:43:58 -I- Loading trgraph from antrax/graphs/graph_37_37.mat
00:44:10 -I- Loading trgraph from antrax/graphs/graph_38_38.mat
00:44:14 -I- Loading trgraph from antrax/graphs/graph_39_39.mat
00:44:16 -I- Loading trgraph from antrax/graphs/graph_40_40.mat
00:44:17 -I- Loading trgraph from antrax/graphs/graph_41_41.mat
00:44:19 -I- Loading trgraph from antrax/graphs/graph_42_42.mat
00:44:21 -I- Loading trgraph from antrax/graphs/graph_43_43.mat
00:44:22 -I- Loading trgraph from antrax/graphs/graph_44_44.mat
00:44:24 -I- Loading trgraph from antrax/graphs/graph_45_45.mat
00:44:25 -I- Loading trgraph from antrax/graphs/graph_46_46.mat
00:44:27 -I- Loading trgraph from antrax/graphs/graph_47_47.mat
00:44:28 -I- Loading trgraph from antrax/graphs/graph_48_48.mat
00:44:28 -I- Finished loading trgraph with 10016 tracklets
00:44:30 -I- Loading ids
00:44:33 -I- Finding single ant nodes
00:44:33 -I- Some preperations
00:44:34 -I- Filtering out tracklets identified as non-ant
00:44:34 -I- ...690 tracklets classified as no-ant were filtered
00:44:34 -I- ...729 short, unconnected and unidentified tracklets were filtered
00:44:35 -I- Propagating ids from src tracklets
00:44:36 -I- ...finished 1000/7355
00:44:36 -I- ...finished 2000/7355
00:44:36 -I- ...finished 3000/7355
00:44:36 -I- ...finished 4000/7355
00:44:36 -I- ...finished 5000/7355
00:44:36 -I- ...finished 6000/7355
00:44:36 -I- ...finished 7000/7355
00:44:36 -I- Propagation loops
Index in position 1 exceeds array bounds.
Error in trgraph/solve>propagate_all (line 522)
Error in trgraph/solve (line 150)
Error in solve_across_movies (line 72)
Error in antrax_mcr_interface (line 53)
MATLAB:badsubscript
In step 3:
$ grep -rHnoL "Done" matlab_export_m_*
matlab_export_m_25.log
matlab_export_m_28.log
matlab_export_m_35.log
matlab_export_m_36.log
$ cat matlab_export_m_25.log
00:58:53 -I- Reading video information from file
00:58:58 -I- Loading trgraph from antrax/graphs/graph_25_25.mat
Error using load
Cannot read file /pfs/data5/home/fr/fr_fr/fr_jm1121/H2CN0402/antrax/graphs/graph_25_25.mat.
Error in trgraph.load (line 879)
Error in trhandles/loaddata (line 607)
Error in export_single_movie (line 51)
Error in antrax_mcr_interface (line 42)
MATLAB:load:cantReadFile
None of the previous logs showed the errors.
But running solve on a local machine with MATLAB already showed errors in step 1:
$ grep -rHinoL "Done" matlab_solve_m_*
matlab_solve_m_25.log
matlab_solve_m_28.log
matlab_solve_m_35.log
matlab_solve_m_36.log
All of those logs show the same error:
$ cat matlab_solve_m_25.log
09:24:23 -D- initializing expreader object
09:24:23 -I- Reading video information from file
09:24:26 -I- Loading trgraph from antrax/graphs/graph_25_25.mat
Error using load
Cannot read file /media/jana/HDD/H2CN0402/antrax/graphs/graph_25_25.mat.
Error in trgraph.load (line 879)
load(fname,'G');
Error in trhandles/loaddata (line 607)
GS = trgraph.load(Trck,movlist);
Error in solve_single_movie (line 52)
G = Trck.loaddata(m,colony);
Errors also appeared during step 2:
$ cat matlab_solve_g_3.log
09:38:37 -D- initializing expreader object
09:38:37 -I- Reading video information from file
09:38:41 -I- solving graph from movies 25-36
09:38:41 -I- Loading trgraph from antrax/graphs/graph_25_25.mat
Error using load
Cannot read file /media/jana/HDD/H2CN0402/antrax/graphs/graph_25_25.mat.
Error in trgraph.load (line 879)
load(fname,'G');
Error in trhandles/loaddata (line 607)
GS = trgraph.load(Trck,movlist);
Error in solve_across_movies (line 70)
G = Trck.loaddata(movlist,colony);
But matlab seems to be able to load the file:
>> load antrax/graphs/graph_25_25.mat
Warning: Variable 'G' originally saved as a trgraph cannot be instantiated as an object and will be read in as a uint32.
And in step 3 it was quite expected:
$ grep -rHinoL "Done" matlab_export_m_*
matlab_export_m_25.log
matlab_export_m_28.log
matlab_export_m_35.log
matlab_export_m_36.log
$ cat matlab_export_m_36.log
10:27:21 -D- initializing expreader object
10:27:21 -I- Reading video information from file
10:27:24 -I- Loading trgraph from antrax/graphs/graph_36_36.mat
Error using load
Cannot read file /media/jana/HDD/H2CN0402/antrax/graphs/graph_36_36.mat.
Error in trgraph.load (line 879)
load(fname,'G');
Error in trhandles/loaddata (line 607)
GS = trgraph.load(Trck,movlist);
Error in export_single_movie (line 51)
G = Trck.loaddata(m,colony);
The above was partially solved by re-running the track step for movies 25,28,35,36 on HPC. Step 2 showed the MATLAB:badsubscript
error in all logs (5 graphs in total), but step 3 finished successfully and the missing mat/csv files have been generated.
$ cat matlab_solve_g_3.log
09:26:30 -I- Reading video information from file
09:26:36 -I- solving graph from movies 25-36
09:26:36 -I- Loading trgraph from antrax/graphs/graph_25_25.mat
09:26:48 -I- Loading trgraph from antrax/graphs/graph_26_26.mat
09:26:52 -I- Loading trgraph from antrax/graphs/graph_27_27.mat
09:26:54 -I- Loading trgraph from antrax/graphs/graph_28_28.mat
09:26:57 -I- Loading trgraph from antrax/graphs/graph_29_29.mat
09:26:57 -I- Loading trgraph from antrax/graphs/graph_30_30.mat
09:26:58 -I- Loading trgraph from antrax/graphs/graph_31_31.mat
09:26:59 -I- Loading trgraph from antrax/graphs/graph_32_32.mat
09:26:59 -I- Loading trgraph from antrax/graphs/graph_33_33.mat
09:27:00 -I- Loading trgraph from antrax/graphs/graph_34_34.mat
09:27:01 -I- Loading trgraph from antrax/graphs/graph_35_35.mat
09:27:09 -I- Loading trgraph from antrax/graphs/graph_36_36.mat
09:27:19 -I- Finished loading trgraph with 17914 tracklets
09:27:21 -I- Loading ids
09:27:25 -I- Finding single ant nodes
09:27:26 -I- Some preperations
09:27:28 -I- Filtering out tracklets identified as non-ant
09:27:28 -I- ...8544 tracklets classified as no-ant were filtered
09:27:28 -I- ...6359 short, unconnected and unidentified tracklets were filtered
09:27:29 -I- Propagating ids from src tracklets
09:27:31 -I- ...finished 1000/7421
09:27:31 -I- ...finished 2000/7421
09:27:31 -I- ...finished 3000/7421
09:27:31 -I- ...finished 4000/7421
09:27:31 -I- ...finished 5000/7421
09:27:31 -I- ...finished 6000/7421
09:27:31 -I- ...finished 7000/7421
09:27:31 -I- Propagation loops
Index in position 1 exceeds array bounds (must not exceed 14008).
Error in trgraph/solve>propagate_all (line 522)
Error in trgraph/solve (line 150)
Error in solve_across_movies (line 72)
Error in antrax_mcr_interface (line 53)
MATLAB:badsubscript
So, if I understand correctly, the corrupted file issue was solved by the rerun?
Regarding the new MATLAB:badsubscript error, it is different than the previous one we had above. I'm not sure what's going on there. After you tracked some of the videos again, did you also run the classify and solve1?
Step 2 actually "stitch" the graphs of individual videos, and propagate information from one video to another. In practice, it is not actually required, and that is why step 3 is able to finish properly. The tracking might be sub optimal at the interface between the videos.
So, if I understand correctly, the corrupted file issue was solved by the rerun?
Yes. It looks like there was some strange error happening that was not reflected in the logs, but produced some corrupt graph MAT files during track
. At least that's my best explanation.
After you tracked some of the videos again, did you also run the classify and solve1?
I tried both actually, both worked. But I went with the latter one. What consequences would re-running track
and then going directly to solve
have on detections?
Theoretically, the algorithm is completely deterministic, so the two runs should have the same tracklet graph and tracklet names. However, there are occasionally some small misalignments between runs that I cannot explain.. Also, when you run track, it cleans some of the data generated by later steps, so it is better to run also the downstream steps.
I'm not sure what you mean by "both worked". Was the latest MATLAB:badsubscript in step 2 solved?
Sorry, I made it too confusing. It looks like I've been dealing with two separate problems (they just looked like one at first): xy files not being generated after step 3 and step 2 showing different errors (either MATLAB:badsubscript
with MCR or Index in position 1 exceeds array bounds
with MATLAB). With "both worked" I was referring to the first problem that was caused by the corrupt graph files generated during track
and fixed by re-running either just track
and then solve
, or track
, classify
, and solve
.
Was the latest MATLAB:badsubscript in step 2 solved?
No, it is still happening.
ok, so let's try to understand this new MATLAB:badsubscript better (its the same error on MCR/matlab, just reported differently). As I said, it's a different one than the one we had before on this thread. We'll have to do it the painful way, as I can't reproduce it on my side.
I've added a few lines of code to report some info on the problematic place. Run it on interactive matlab:
Trck = trhandles(uigetdir);
solve_across_movies(Trck, 'g', 3);
Hmm, maybe I am doing something wrong here:
>> addpath(genpath(['.','/matlab']));
>> Trck = trhandles(uigetdir);
Warning: uigetdir is no longer supported when MATLAB is started with the -nodisplay or -noFigureWindows option or there is no display. For more information, see "Changes to
-nodisplay and -noFigureWindows Startup Options" in the MATLAB Release Notes. To view the release note in your system browser, run
web('www.mathworks.com/help/matlab/release-notes.html#br5ktrh-3', '-browser')
> In warnfiguredialog (line 21)
In uigetdir (line 60)
Error using javaObjectEDT
Scalar input must be a java object
Error in matlab.ui.internal.dialog.Dialog/getParentFrame (line 46)
obj.ParentFrame = javaObjectEDT(com.mathworks.hg.peer.utils.DialogUtilities.createParentWindow);
Error in matlab.ui.internal.dialog.FileSystemChooser/getParentFrame (line 129)
parframe = getParentFrame@matlab.ui.internal.dialog.Dialog(obj);
Error in matlab.ui.internal.dialog.FolderChooser/doShowDialog (line 70)
javaMethodEDT('showOpenDialog', obj.Peer, getParentFrame(obj));
Error in matlab.ui.internal.dialog.FolderChooser/show (line 48)
doShowDialog(obj)
Error in uigetdir_helper (line 32)
dirdlg.show();
Error in uigetdir (line 61)
[directoryname] = uigetdir_helper(varargin{:});
>> Exception in thread "AWT-EventQueue-0" java.awt.HeadlessException
at java.awt.GraphicsEnvironment.checkHeadless(GraphicsEnvironment.java:204)
at java.awt.Window.<init>(Window.java:536)
at java.awt.Frame.<init>(Frame.java:420)
at javax.swing.JFrame.<init>(JFrame.java:233)
at com.mathworks.mwswing.MJFrame.<init>(MJFrame.java:108)
at com.mathworks.mwswing.MJFrame.<init>(MJFrame.java:101)
at com.mathworks.hg.peer.utils.DialogUtilities$1.runWithOutput(DialogUtilities.java:56)
at com.mathworks.jmi.AWTUtilities$Invoker$2.watchedRun(AWTUtilities.java:475)
at com.mathworks.jmi.AWTUtilities$WatchedRunnable.run(AWTUtilities.java:436)
at java.awt.event.InvocationEvent.dispatch(InvocationEvent.java:311)
at java.awt.EventQueue.dispatchEventImpl(EventQueue.java:758)
at java.awt.EventQueue.access$500(EventQueue.java:97)
at java.awt.EventQueue$3.run(EventQueue.java:709)
at java.awt.EventQueue$3.run(EventQueue.java:703)
at java.security.AccessController.doPrivileged(Native Method)
at java.security.ProtectionDomain$JavaSecurityAccessImpl.doIntersectionPrivilege(ProtectionDomain.java:74)
at java.awt.EventQueue.dispatchEvent(EventQueue.java:728)
at java.awt.EventDispatchThread.pumpOneEventForFilters(EventDispatchThread.java:205)
at java.awt.EventDispatchThread.pumpEventsForFilter(EventDispatchThread.java:116)
at java.awt.EventDispatchThread.pumpEventsForHierarchy(EventDispatchThread.java:105)
at java.awt.EventDispatchThread.pumpEvents(EventDispatchThread.java:101)
at java.awt.EventDispatchThread.pumpEvents(EventDispatchThread.java:93)
at java.awt.EventDispatchThread.run(EventDispatchThread.java:82)
uigetdir does not work when you run matlab without a display (eg with the -nodisplay option, or through ssh). You can just enter the path to the expdir instead with Trck=trhandles(path-to-expdir);
On Apr 18, 2021, at 9:01 PM, Jana Mach @.***> wrote:
Hmm, maybe I am doing something wrong here:
addpath(genpath(['.','/matlab'])); Trck = trhandles(uigetdir);
Warning: uigetdir is no longer supported when MATLAB is started with the -nodisplay or -noFigureWindows option or there is no display. For more information, see "Changes to -nodisplay and -noFigureWindows Startup Options" in the MATLAB Release Notes. To view the release note in your system browser, run web('www.mathworks.com/help/matlab/release-notes.html#br5ktrh-3', '-browser') In warnfiguredialog (line 21) In uigetdir (line 60) Error using javaObjectEDT Scalar input must be a java objectError in matlab.ui.internal.dialog.Dialog/getParentFrame (line 46) obj.ParentFrame = javaObjectEDT(com.mathworks.hg.peer.utils.DialogUtilities.createParentWindow);
Error in matlab.ui.internal.dialog.FileSystemChooser/getParentFrame (line 129) parframe = @.***Dialog(obj);
Error in matlab.ui.internal.dialog.FolderChooser/doShowDialog (line 70) javaMethodEDT('showOpenDialog', obj.Peer, getParentFrame(obj));
Error in matlab.ui.internal.dialog.FolderChooser/show (line 48) doShowDialog(obj)
Error in uigetdir_helper (line 32) dirdlg.show();
Error in uigetdir (line 61) [directoryname] = uigetdir_helper(varargin{:});
Exception in thread "AWT-EventQueue-0" java.awt.HeadlessException at java.awt.GraphicsEnvironment.checkHeadless(GraphicsEnvironment.java:204) at java.awt.Window.
(Window.java:536) at java.awt.Frame. (Frame.java:420) at javax.swing.JFrame. (JFrame.java:233) at com.mathworks.mwswing.MJFrame. (MJFrame.java:108) at com.mathworks.mwswing.MJFrame. (MJFrame.java:101) at com.mathworks.hg.peer.utils.DialogUtilities$1.runWithOutput(DialogUtilities.java:56) at com.mathworks.jmi.AWTUtilities$Invoker$2.watchedRun(AWTUtilities.java:475) at com.mathworks.jmi.AWTUtilities$WatchedRunnable.run(AWTUtilities.java:436) at java.awt.event.InvocationEvent.dispatch(InvocationEvent.java:311) at java.awt.EventQueue.dispatchEventImpl(EventQueue.java:758) at java.awt.EventQueue.access$500(EventQueue.java:97) at java.awt.EventQueue$3.run(EventQueue.java:709) at java.awt.EventQueue$3.run(EventQueue.java:703) at java.security.AccessController.doPrivileged(Native Method) at java.security.ProtectionDomain$JavaSecurityAccessImpl.doIntersectionPrivilege(ProtectionDomain.java:74) at java.awt.EventQueue.dispatchEvent(EventQueue.java:728) at java.awt.EventDispatchThread.pumpOneEventForFilters(EventDispatchThread.java:205) at java.awt.EventDispatchThread.pumpEventsForFilter(EventDispatchThread.java:116) at java.awt.EventDispatchThread.pumpEventsForHierarchy(EventDispatchThread.java:105) at java.awt.EventDispatchThread.pumpEvents(EventDispatchThread.java:101) at java.awt.EventDispatchThread.pumpEvents(EventDispatchThread.java:93) at java.awt.EventDispatchThread.run(EventDispatchThread.java:82) — You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.
Sorry, that was silly of me :-/ With the dataset that had the problem:
>> Trck = trhandles('.');
21:25:09 -I- Loading tracking session from expdir
21:25:17 -I- Reading video information from file
>> solve_across_movies(Trck, 'g', 3);
Error using solve_across_movies (line 11)
Expected a string scalar or character vector for the parameter name.
>>
Hi :-)
The HPC server I am using has certain limits per user (100 schedules jobs, 3 days max per job). The
solve
step in my case generated more than 100 jobs, causing some of the jobs getting cancelled. Since thesolve
step is a three step process, I figured I can start each step manually, e.g.:$ sbatch path/to/hpc_solve1.sh
Is this a reasonable way to do it? Is there a different way to go around the max job per user thing?