Closed janamach closed 3 years ago
After further investigation, I think something else might be wrong. I get the error even then the error causing label has been detected.
At what point do the xy *.csv
files in session/antdata/
should be generated?
Hey Jana,
The error is because there is a discrepancy between the labels in the labels.csv, and the labels for which xy data exists. This is weird because even if some ants are not detected at all, they should have an all-nan entry in the xy file.
The xy files are generated after the 'solve' step. If they do not exist, it might explain the error. If you can't find them, try looking at the logs in session/logs/matlabsolve and session/logs/matlabexport
The csv
files were not generated during the solve step. You are right, the logs do show an error:
(base) jana@unicorn:~/ants/CN0307/cn0307/logs$ ls
matlab_export_m_1.log matlab_solve_g_1.log matlab_track_m_1.log
matlab_export_m_2.log matlab_solve_m_1.log matlab_track_m_2.log
matlab_export_m_3.log matlab_solve_m_2.log matlab_track_m_3.log
matlab_export_m_4.log matlab_solve_m_3.log matlab_track_m_4.log
matlab_export_m_5.log matlab_solve_m_4.log matlab_track_m_5.log
matlab_export_m_6.log matlab_solve_m_5.log matlab_track_m_6.log
matlab_link_across_movies.log matlab_solve_m_6.log
(base) jana@unicorn:~/ants/CN0307/cn0307/logs$ cat matlab_export_m_1.log
20:19:24 -D- initializing expreader object
20:19:24 -I- Reading video information from file
20:19:25 -I- Loading trgraph from cn0307/graphs/graph_1_1.mat
20:19:26 -I- Finished loading trgraph with 587 tracklets
20:19:26 -I- Loading tracklet data for movie 1
Error using trgraph/export_xy (line 143)
Invalid field name: 'G-GB'.
Error in export_single_movie (line 52)
export_xy(G,'interpolate',false);
(base) jana@unicorn:~/ants/CN0307/cn0307/logs$ cat matlab_export_m_6.log
20:19:24 -D- initializing expreader object
20:19:24 -I- Reading video information from file
20:19:25 -I- Loading trgraph from cn0307/graphs/graph_6_6.mat
20:19:26 -I- Finished loading trgraph with 898 tracklets
20:19:26 -I- Loading tracklet data for movie 6
Error using trgraph/export_xy (line 143)
Invalid field name: 'G-GB'.
Error in export_single_movie (line 52)
export_xy(G,'interpolate',false);
I am wondering if the -
in G-GB
is an invalid matlab field character?
Indeed, you cannot use special characters as field names in matlab structures :-( I usually use X as an ambiguous color mark...
Ah, beginners mistake! Thanks :-)
I didn't thing about it either till now, should probably add a meaningful error message :-)
For some reason I thought that the labels were handled as strings even though the first error was already complaining about fields. I changed all -
with x
and repeating all steps now, can't wait to see if it worked :-)
So cool, it worked! I had my first CSV files generated, woohoo! :-)
Unrelated to this issue, I seem to be having another problem though: I don't seem to be able to retrain. I've tried it a few times already and I seem to be getting the same error that says -E- Class list in example dir does not match classifier. Use --from-scratch to train a new model
. If I re-train with the --scratch flag and then try to re-train again, I get the same error. Any ideas why?
The full output looks like this:
$ antrax train CN0307/cn0307_3/classifier/
==================================================================================
Welcome to anTraX - a software for tracking color tagged ants (and other insects)
==================================================================================
WARNING:tensorflow:From /home/jana/anaconda3/envs/antrax/lib/python3.6/site-packages/tensorflow_core/python/ops/init_ops.py:97: calling GlorotUniform.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.
Instructions for updating:
Call initializer instance with the dtype argument instead of passing it to the constructor
WARNING:tensorflow:From /home/jana/anaconda3/envs/antrax/lib/python3.6/site-packages/tensorflow_core/python/ops/init_ops.py:97: calling Zeros.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.
Instructions for updating:
Call initializer instance with the dtype argument instead of passing it to the constructor
WARNING:tensorflow:From /home/jana/anaconda3/envs/antrax/lib/python3.6/site-packages/tensorflow_core/python/ops/resource_variable_ops.py:1630: calling BaseResourceVariable.__init__ (from tensorflow.python.ops.resource_variable_ops) with constraint is deprecated and will be removed in a future version.
Instructions for updating:
If using Keras pass *_constraint arguments to layers.
WARNING:tensorflow:From /home/jana/anaconda3/envs/antrax/lib/python3.6/site-packages/tensorflow_core/python/ops/init_ops.py:97: calling Ones.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.
Instructions for updating:
Call initializer instance with the dtype argument instead of passing it to the constructor
2021-03-09 20:38:51.147687: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1
2021-03-09 20:38:51.170503: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-03-09 20:38:51.170707: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 0 with properties:
name: GeForce GTX 1650 major: 7 minor: 5 memoryClockRate(GHz): 1.83
pciBusID: 0000:01:00.0
2021-03-09 20:38:51.170844: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0
2021-03-09 20:38:51.171541: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10.0
2021-03-09 20:38:51.172135: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10.0
2021-03-09 20:38:51.172283: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10.0
2021-03-09 20:38:51.173079: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10.0
2021-03-09 20:38:51.173763: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10.0
2021-03-09 20:38:51.173848: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcudnn.so.7'; dlerror: libcudnn.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/ros/kinetic/lib:/opt/ros/kinetic/lib/x86_64-linux-gnu
2021-03-09 20:38:51.173876: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1641] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
2021-03-09 20:38:51.174186: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2021-03-09 20:38:51.179125: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 3600000000 Hz
2021-03-09 20:38:51.179871: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x5556dd5702a0 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2021-03-09 20:38:51.179883: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Host, Default Version
2021-03-09 20:38:51.237011: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-03-09 20:38:51.237266: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x5556dd5d3030 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2021-03-09 20:38:51.237280: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): GeForce GTX 1650, Compute Capability 7.5
2021-03-09 20:38:51.237330: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1159] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-03-09 20:38:51.237336: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1165]
-E- Class list in example dir does not match classifier. Use --from-scratch to train a new model
When a classifier is first trained, it get the list of classes from the examples directory. When you retrain, if the class list in the examples directory was changed, it throws an error.
Did you maybe add/remove/rename classes after you first trained the classifier?
Hmmm, I am not absolutely sure I know what you're asking. During the retraining step I would only use the extract-trainset GUI where I would correct the incorrect label assignments, I wouldn't navigate into the session directory and change files manually.
I have a long list of labels, in rare cases I would click the wrong one, continue, realize I made a mistake, and then go back, change, and export again. Could that cause an issue?
The gui is used to export cropped image examples. These are organized by class label in the classifier directory (the default is in session/classifier/). To create/train a classifier, antrax uses TensorFlow. TF looks at the trainset to understand how many classes the classifier needs to identify, and not at the labels.csv file. Therefore, it is important to have at least one example for each class/label in the experiment before initial training.
If you have many labels and this is hard, you can create manually class subdirectories at session/classifier/examples/LABEL, and just put a placeholder image there. Once you have real examples for that label, delete the placeholder.
Ahh! Thank you :) I ran a small experiment to make sure I understood what you just said. Before the retraining step, I made a copy of the experimental directory. In the first case, I corrected only using the labels that already existed in the classifier/examples/
directory and TF did not complain during the subsequent antrax train
step. In the second case, I did the same and added an extra label (that created a new directory in the examples folder, as you said), which resulted in the error I was getting earlier.
Thank you again for explaining so clearly, this fixes the last technical issue I had so far. This morning I generated the first trajectories from our videos with 35+ individually labeled and untagged ants. With only one round of training the result was remarkably accurate. Now that I know what I did wrong in the re-training step, I can foresee some very productive clicking opportunities with even more accurate results :-)
Great, happy its working for you!
Hi!
This is not really an issue, more of a suggestion (or a question?). The problem is antrax quits with an error when
labels.csv
contains an undetected label and proceeds further once the label is removed (in my case until it runs into the next unused label). So I kept removing the "extra" labels from the csv file, the output in the terminal looked like this:In this particular case I was trying out antrax on a set of short videos and due to the open boundary condition not all ants could be in the frame throughout that time (hence the error). Provided that
labels.csv
contains undetected labels, wouldantrax validate
load if an exception handler for theReference to non-existent field
error was added or would it cause further problems?