Social-Evolution-and-Behavior / anTraX

anTraX: high throughput tracking of color-tagged insects
https://antrax.readthedocs.io/
GNU General Public License v3.0
17 stars 3 forks source link

Exception handler for "Reference to non-existent field" in antrax validate? #12

Closed janamach closed 3 years ago

janamach commented 3 years ago

Hi!

This is not really an issue, more of a suggestion (or a question?). The problem is antrax quits with an error when labels.csv contains an undetected label and proceeds further once the label is removed (in my case until it runs into the next unused label). So I kept removing the "extra" labels from the csv file, the output in the terminal looked like this:

(antrax) jana@unicorn:~/ants$ antrax validate CN0307/

==================================================================================

Welcome to anTraX - a software for tracking color tagged ants (and other insects)

==================================================================================

08/03/21 20:27:42 -D- running matlab mcr 
08/03/21 20:27:42 -D- command is: /home/jana/src/anTraX/bin/antrax_glnxa64_mcr_interface validate_tracking CN0307/
20:27:58 -D- initializing expreader object
20:27:58 -I- Reading video information from file
Reference to non-existent field 'G-GY'.
Error in validate_tracking/set_experiment (line 275)

Error in validate_tracking/startupFcn (line 429)

Error in appdesigner.internal.service.AppManagementService/tryCallback (line 336)

Error in matlab.apps.AppBase/runStartupFcn (line 41)

Error in validate_tracking (line 640)

Error in antrax_mcr_interface (line 20)
MATLAB:nonExistentField
08/03/21 20:28:08 -D- matlab app exited with code 249
(antrax) jana@unicorn:~/ants$ antrax validate CN0307/

==================================================================================

Welcome to anTraX - a software for tracking color tagged ants (and other insects)

==================================================================================

08/03/21 20:28:23 -D- running matlab mcr 
08/03/21 20:28:23 -D- command is: /home/jana/src/anTraX/bin/antrax_glnxa64_mcr_interface validate_tracking CN0307/
20:28:38 -D- initializing expreader object
20:28:38 -I- Reading video information from file
Reference to non-existent field 'GBB'.
Error in validate_tracking/set_experiment (line 275)

Error in validate_tracking/startupFcn (line 429)

Error in appdesigner.internal.service.AppManagementService/tryCallback (line 336)

Error in matlab.apps.AppBase/runStartupFcn (line 41)

Error in validate_tracking (line 640)

Error in antrax_mcr_interface (line 20)
MATLAB:nonExistentField
08/03/21 20:28:48 -D- matlab app exited with code 249
(antrax) jana@unicorn:~/ants$ antrax validate CN0307/

==================================================================================

Welcome to anTraX - a software for tracking color tagged ants (and other insects)

==================================================================================

08/03/21 20:29:05 -D- running matlab mcr 
08/03/21 20:29:05 -D- command is: /home/jana/src/anTraX/bin/antrax_glnxa64_mcr_interface validate_tracking CN0307/
20:29:20 -D- initializing expreader object
20:29:20 -I- Reading video information from file
Reference to non-existent field 'GBG'.
Error in validate_tracking/set_experiment (line 275)

Error in validate_tracking/startupFcn (line 429)

Error in appdesigner.internal.service.AppManagementService/tryCallback (line 336)

Error in matlab.apps.AppBase/runStartupFcn (line 41)

Error in validate_tracking (line 640)

Error in antrax_mcr_interface (line 20)
MATLAB:nonExistentField
08/03/21 20:29:30 -D- matlab app exited with code 249
(antrax) jana@unicorn:~/ants$ antrax validate CN0307/

==================================================================================

Welcome to anTraX - a software for tracking color tagged ants (and other insects)

==================================================================================

08/03/21 20:29:54 -D- running matlab mcr 
08/03/21 20:29:54 -D- command is: /home/jana/src/anTraX/bin/antrax_glnxa64_mcr_interface validate_tracking CN0307/
20:30:09 -D- initializing expreader object
20:30:09 -I- Reading video information from file
Reference to non-existent field 'GBR'.
Error in validate_tracking/set_experiment (line 275)

Error in validate_tracking/startupFcn (line 429)

Error in appdesigner.internal.service.AppManagementService/tryCallback (line 336)

Error in matlab.apps.AppBase/runStartupFcn (line 41)

Error in validate_tracking (line 640)

Error in antrax_mcr_interface (line 20)
MATLAB:nonExistentField
08/03/21 20:30:19 -D- matlab app exited with code 249

In this particular case I was trying out antrax on a set of short videos and due to the open boundary condition not all ants could be in the frame throughout that time (hence the error). Provided that labels.csv contains undetected labels, would antrax validate load if an exception handler for the Reference to non-existent field error was added or would it cause further problems?

janamach commented 3 years ago

After further investigation, I think something else might be wrong. I get the error even then the error causing label has been detected.

At what point do the xy *.csv files in session/antdata/ should be generated?

asafgal commented 3 years ago

Hey Jana,

The error is because there is a discrepancy between the labels in the labels.csv, and the labels for which xy data exists. This is weird because even if some ants are not detected at all, they should have an all-nan entry in the xy file.

The xy files are generated after the 'solve' step. If they do not exist, it might explain the error. If you can't find them, try looking at the logs in session/logs/matlabsolve and session/logs/matlabexport

janamach commented 3 years ago

The csv files were not generated during the solve step. You are right, the logs do show an error:

(base) jana@unicorn:~/ants/CN0307/cn0307/logs$ ls
matlab_export_m_1.log          matlab_solve_g_1.log  matlab_track_m_1.log
matlab_export_m_2.log          matlab_solve_m_1.log  matlab_track_m_2.log
matlab_export_m_3.log          matlab_solve_m_2.log  matlab_track_m_3.log
matlab_export_m_4.log          matlab_solve_m_3.log  matlab_track_m_4.log
matlab_export_m_5.log          matlab_solve_m_4.log  matlab_track_m_5.log
matlab_export_m_6.log          matlab_solve_m_5.log  matlab_track_m_6.log
matlab_link_across_movies.log  matlab_solve_m_6.log
(base) jana@unicorn:~/ants/CN0307/cn0307/logs$ cat matlab_export_m_1.log 
20:19:24 -D- initializing expreader object
20:19:24 -I- Reading video information from file
20:19:25 -I- Loading trgraph from cn0307/graphs/graph_1_1.mat
20:19:26 -I- Finished loading trgraph with 587 tracklets
20:19:26 -I- Loading tracklet data for movie 1
Error using trgraph/export_xy (line 143)
Invalid field name: 'G-GB'.

Error in export_single_movie (line 52)
export_xy(G,'interpolate',false);

(base) jana@unicorn:~/ants/CN0307/cn0307/logs$ cat matlab_export_m_6.log 
20:19:24 -D- initializing expreader object
20:19:24 -I- Reading video information from file
20:19:25 -I- Loading trgraph from cn0307/graphs/graph_6_6.mat
20:19:26 -I- Finished loading trgraph with 898 tracklets
20:19:26 -I- Loading tracklet data for movie 6
Error using trgraph/export_xy (line 143)
Invalid field name: 'G-GB'.

Error in export_single_movie (line 52)
export_xy(G,'interpolate',false);

I am wondering if the - in G-GB is an invalid matlab field character?

asafgal commented 3 years ago

Indeed, you cannot use special characters as field names in matlab structures :-( I usually use X as an ambiguous color mark...

janamach commented 3 years ago

Ah, beginners mistake! Thanks :-)

asafgal commented 3 years ago

I didn't thing about it either till now, should probably add a meaningful error message :-)

janamach commented 3 years ago

For some reason I thought that the labels were handled as strings even though the first error was already complaining about fields. I changed all - with x and repeating all steps now, can't wait to see if it worked :-)

janamach commented 3 years ago

So cool, it worked! I had my first CSV files generated, woohoo! :-)

Unrelated to this issue, I seem to be having another problem though: I don't seem to be able to retrain. I've tried it a few times already and I seem to be getting the same error that says -E- Class list in example dir does not match classifier. Use --from-scratch to train a new model. If I re-train with the --scratch flag and then try to re-train again, I get the same error. Any ideas why?

The full output looks like this:

$ antrax train CN0307/cn0307_3/classifier/ 

==================================================================================

Welcome to anTraX - a software for tracking color tagged ants (and other insects)

==================================================================================

WARNING:tensorflow:From /home/jana/anaconda3/envs/antrax/lib/python3.6/site-packages/tensorflow_core/python/ops/init_ops.py:97: calling GlorotUniform.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.
Instructions for updating:
Call initializer instance with the dtype argument instead of passing it to the constructor
WARNING:tensorflow:From /home/jana/anaconda3/envs/antrax/lib/python3.6/site-packages/tensorflow_core/python/ops/init_ops.py:97: calling Zeros.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.
Instructions for updating:
Call initializer instance with the dtype argument instead of passing it to the constructor
WARNING:tensorflow:From /home/jana/anaconda3/envs/antrax/lib/python3.6/site-packages/tensorflow_core/python/ops/resource_variable_ops.py:1630: calling BaseResourceVariable.__init__ (from tensorflow.python.ops.resource_variable_ops) with constraint is deprecated and will be removed in a future version.
Instructions for updating:
If using Keras pass *_constraint arguments to layers.
WARNING:tensorflow:From /home/jana/anaconda3/envs/antrax/lib/python3.6/site-packages/tensorflow_core/python/ops/init_ops.py:97: calling Ones.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.
Instructions for updating:
Call initializer instance with the dtype argument instead of passing it to the constructor
2021-03-09 20:38:51.147687: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1
2021-03-09 20:38:51.170503: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-03-09 20:38:51.170707: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 0 with properties: 
name: GeForce GTX 1650 major: 7 minor: 5 memoryClockRate(GHz): 1.83
pciBusID: 0000:01:00.0
2021-03-09 20:38:51.170844: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0
2021-03-09 20:38:51.171541: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10.0
2021-03-09 20:38:51.172135: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10.0
2021-03-09 20:38:51.172283: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10.0
2021-03-09 20:38:51.173079: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10.0
2021-03-09 20:38:51.173763: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10.0
2021-03-09 20:38:51.173848: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcudnn.so.7'; dlerror: libcudnn.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/ros/kinetic/lib:/opt/ros/kinetic/lib/x86_64-linux-gnu
2021-03-09 20:38:51.173876: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1641] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
2021-03-09 20:38:51.174186: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2021-03-09 20:38:51.179125: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 3600000000 Hz
2021-03-09 20:38:51.179871: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x5556dd5702a0 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2021-03-09 20:38:51.179883: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
2021-03-09 20:38:51.237011: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-03-09 20:38:51.237266: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x5556dd5d3030 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2021-03-09 20:38:51.237280: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): GeForce GTX 1650, Compute Capability 7.5
2021-03-09 20:38:51.237330: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1159] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-03-09 20:38:51.237336: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1165]      
-E- Class list in example dir does not match classifier. Use --from-scratch to train a new model
asafgal commented 3 years ago

When a classifier is first trained, it get the list of classes from the examples directory. When you retrain, if the class list in the examples directory was changed, it throws an error.

Did you maybe add/remove/rename classes after you first trained the classifier?

janamach commented 3 years ago

Hmmm, I am not absolutely sure I know what you're asking. During the retraining step I would only use the extract-trainset GUI where I would correct the incorrect label assignments, I wouldn't navigate into the session directory and change files manually.

I have a long list of labels, in rare cases I would click the wrong one, continue, realize I made a mistake, and then go back, change, and export again. Could that cause an issue?

asafgal commented 3 years ago

The gui is used to export cropped image examples. These are organized by class label in the classifier directory (the default is in session/classifier/). To create/train a classifier, antrax uses TensorFlow. TF looks at the trainset to understand how many classes the classifier needs to identify, and not at the labels.csv file. Therefore, it is important to have at least one example for each class/label in the experiment before initial training.

If you have many labels and this is hard, you can create manually class subdirectories at session/classifier/examples/LABEL, and just put a placeholder image there. Once you have real examples for that label, delete the placeholder.

janamach commented 3 years ago

Ahh! Thank you :) I ran a small experiment to make sure I understood what you just said. Before the retraining step, I made a copy of the experimental directory. In the first case, I corrected only using the labels that already existed in the classifier/examples/ directory and TF did not complain during the subsequent antrax train step. In the second case, I did the same and added an extra label (that created a new directory in the examples folder, as you said), which resulted in the error I was getting earlier.

Thank you again for explaining so clearly, this fixes the last technical issue I had so far. This morning I generated the first trajectories from our videos with 35+ individually labeled and untagged ants. With only one round of training the result was remarkably accurate. Now that I know what I did wrong in the re-training step, I can foresee some very productive clicking opportunities with even more accurate results :-)

asafgal commented 3 years ago

Great, happy its working for you!