YttriLab / A-SOID

An active learning platform for expert-guided, data efficient discovery of behavior.
Other
55 stars 8 forks source link

Wrong Framerate during Data Import; ValueError while Extracting Features #36

Closed mitras1210 closed 11 months ago

mitras1210 commented 1 year ago

Hello! I am getting this error message when I am trying to extract features from my DLC dataset. I was wondering if you could help with this, please.

ValueError: Found input variables with inconsistent numbers of samples: [2533, 380] Traceback: File "C:\Users\User\anaconda3\envs\asoid\lib\site-packages\streamlit\scriptrunner\script_runner.py", line 557, in _run_script exec(code, module.dict) File "C:\Users\User\anaconda3\envs\asoid\lib\site-packages\asoid\app.py", line 332, in main() File "C:\Users\User\anaconda3\envs\asoid\lib\site-packages\asoid\app.py", line 328, in main application_function() File "C:\Users\User\anaconda3\envs\asoid\lib\site-packages\asoid\apps\B_extract_features.py", line 101, in main extractor.main() File "C:\Users\User\anaconda3\envs\asoid\lib\site-packages\asoid\utils\extract_features.py", line 524, in main self.shuffle_data() File "C:\Users\User\anaconda3\envs\asoid\lib\site-packages\asoid\utils\extract_features.py", line 494, in shuffle_data X_train, X_test, y_train, y_test = train_test_split(self.scaled_features, self.targets_mode, File "C:\Users\User\anaconda3\envs\asoid\lib\site-packages\sklearn\model_selection_split.py", line 2559, in train_test_split arrays = indexable(arrays) File "C:\Users\User\anaconda3\envs\asoid\lib\site-packages\sklearn\utils\validation.py", line 443, in indexable check_consistent_length(result) File "C:\Users\User\anaconda3\envs\asoid\lib\site-packages\sklearn\utils\validation.py", line 397, in check_consistent_length raise ValueError(

Also, during the preprocessing step, A-SOiD is giving a framerate of 30 frames. I believe that the framerate is supposed to be 10 frames. Would you be able to help me out with this, please?

Thank you so much!!

JensBlack commented 1 year ago

Hi, thank you for your patience. The preprocessing step has a standard parameter of 30 in the framerate field, you can change this to the framerate that your pose estimation is in.

The issue is most likely connected to the preprocessing step, in which the annotations are resampled to match the timescale of the pose estimation. It would be great if you could try this out with the correct framerate and report back to me. Also, to save us both some time, please add some additional info about your installation of asoid etc.

mitras1210 commented 1 year ago

Hello, I had tried the preprocessing step with the correct framerate for the pose estimations, which is 200. After this, the feature extraction step worked. However, I am now getting errors for the Active Learning step.

We have tried filtering and down sampling our pose estimation data to be the same index size as the annotations file. From my understanding, the annotations file has a timescale of 0.1 seconds. After using our down-sampled DLC dataset and the annotations file with the correct framerate, the following error appear during the active learning step:

AxisError: axis 1 is out of bounds for array of dimension 1
Traceback:
File "C:\Users\User\anaconda3\envs\asoid\lib\site-packages\streamlit\scriptrunner\script_runner.py", line 557, in _run_script
    exec(code, module.__dict__)
File "C:\Users\User\anaconda3\envs\asoid\lib\site-packages\asoid\app.py", line 332, in <module>
    main()
File "C:\Users\User\anaconda3\envs\asoid\lib\site-packages\asoid\app.py", line 328, in main
    application_function()
File "C:\Users\User\anaconda3\envs\asoid\lib\site-packages\asoid\apps\C_auto_active_learning.py", line 104, in main
    show_classifier_results(annotation_classes, all_f1_scores,
File "C:\Users\User\anaconda3\envs\asoid\lib\site-packages\asoid\utils\auto_active_learning.py", line 40, in show_classifier_results
    scores = np.vstack((np.hstack(np.mean(base_score, axis=0)), np.vstack(np.mean(learn_score, axis=1))))
File "<__array_function__ internals>", line 180, in mean
File "C:\Users\User\anaconda3\envs\asoid\lib\site-packages\numpy\core\fromnumeric.py", line 3474, in mean
    return _methods._mean(a, axis=axis, dtype=dtype,
File "C:\Users\User\anaconda3\envs\asoid\lib\site-packages\numpy\core\_methods.py", line 167, in _mean
    rcount = _count_reduce_items(arr, axis, keepdims=keepdims, where=where)
File "C:\Users\User\anaconda3\envs\asoid\lib\site-packages\numpy\core\_methods.py", line 76, in _count_reduce_items
    items *= arr.shape[mu.normalize_axis_index(ax, arr.ndim)]

Would you kindly be able to help with this?

Also, A-SOiD was installed on Windows using the Anaconda Command Prompt terminal. The instructions provided on the GitHub for installation were followed exactly.

JensBlack commented 1 year ago

Looks like a plotting error during base classification. Can you please post the content of your config.ini file's content and tell me more about the use case? Also, which parameters did you select during active learning?

mitras1210 commented 1 year ago

The contents of the config.ini file are the following:

[Project] PROJECT_TYPE = DeepLabCut PROJECT_NAME = Run_9 PROJECT_PATH = Z:\SrishtiWorkspace\ASOiD Data FRAMERATE = 10 KEYPOINTS_CHOSEN = l_wrist, r_wrist, l_hand, r_hand, l_pink_knuck, l_pink_nail, l_ring_knuck, l_ring_nail, l_mid_knuck, l_mid_nail, l_point_kuck, l_point_nail, l_stump, r_pink_knuck, r_pink_nail, r_ring_knuck, r_ring_nail, r_mid_knuck, r_mid_nail, r_point_kuck, r_point_nail, r_stump EXCLUDE_OTHER = False FILE_TYPE = csv INDIVIDUALS_CHOSEN = single animal CLASSES = Bilateral Forelimb Reach, Bilateral Forelimb Retract, Forelimb Oppose, other MULTI_ANIMAL = False

[Data] DATA_INPUT_FILES = A61M1-4-awake_DLC_3D(new).csv LABEL_INPUT_FILES = FrontLeft_0.1annotations.csv ROOT_PATH = None

[Processing] ITERATION = 0 N_SHUFFLED_SPLITS = 10 MIN_DURATION = 2.5 TRAIN_FRACTION = 0.01 MAX_ITER = 100 MAX_SAMPLES_ITER = 20 N_SHUFFLED_SPLIT = 10

The parameters used for active learning are: initial sampling ration: 0.01 max number of iterations: 100 samples per iteration: 20 These were the default parameters set by ASOID. I did not change them.

JensBlack commented 1 year ago

You are using a single file for this project, so let me ask you some questions concerning this:

  1. Could you tell me if each behavior is in the file? Behaviors that are represented as columns in the annotation file, but are not present, will still be shown in the import step and will cause problems when we divide into train and test sets.
  2. You set a minimum duration of 2.5 seconds. That seems unreasonably high. Is this on purpose? For most cases, I'd recommend using a feature window between 100 - 1000 ms (which is in line with most publications on subsecond components of behaviors), but you know your field best.
  3. Given that you are only using 1 file, the train fraction for iter0 seems quite low. Please make sure that there are at least a few samples for each class (increase the sample ratio e.g., to 0.1 -> 10%)

    1%

    grafik

    10%

    grafik

JensBlack commented 1 year ago

Once we figure this out, we should tackle the framerate.

mitras1210 commented 1 year ago

Thank you so much for your response!

To answer your questions:

  1. No, each behavior is not present in the file. I only included it in the annotation file. I was not sure what the structure of the annotation file should look like.
  2. Yes, the 2.5 seconds is on purpose. From my understanding, the minimum duration is set with respect to how long it takes for a feature to occur. In our dataset, all features were occurring within that window of time. I think my understanding of the minimum duration might be incorrect.
  3. Thank you! I will try increasing the sampling ratio and see what the results are.

Once again, I truly appreciate all the help!

mitras1210 commented 1 year ago

This is the error I get after using the same dataset but setting the sampling ratio to 0.1.

ValueError: need at least one array to concatenate. Traceback: File "C:\Users\User\anaconda3\envs\asoid\lib\site-packages\streamlit\scriptrunner\script_runner.py", line 557, in _run_script exec(code, module.dict) File "C:\Users\User\anaconda3\envs\asoid\lib\site-packages\asoid\app.py", line 332, in main() File "C:\Users\User\anaconda3\envs\asoid\lib\site-packages\asoid\app.py", line 328, in main application_function() File "C:\Users\User\anaconda3\envs\asoid\lib\site-packages\asoid\apps\C_auto_active_learning.py", line 128, in main rf_classifier.main() File "C:\Users\User\anaconda3\envs\asoid\lib\site-packages\asoid\utils\auto_active_learning.py", line 549, in main self.base_classification() File "C:\Users\User\anaconda3\envs\asoid\lib\site-packages\asoid\utils\auto_active_learning.py", line 323, in base_classification self.subsampled_classify() File "C:\Users\User\anaconda3\envs\asoid\lib\site-packages\asoid\utils\auto_active_learning.py", line 235, in subsampled_classify X_train = np.vstack(X) File "<__array_function__ internals>", line 180, in vstack File "C:\Users\User\anaconda3\envs\asoid\lib\site-packages\numpy\core\shape_base.py", line 282, in vstack return _nx.concatenate(arrs, 0) File "<__array_function__ internals>", line 180, in concatenate

JensBlack commented 1 year ago

Thank you so much for your response!

To answer your questions:

2. Yes, the 2.5 seconds is on purpose. From my understanding, the minimum duration is set with respect to how long it takes for a feature to occur. In our dataset, all features were occurring within that window of time. I think my understanding of the minimum duration might be incorrect.

3. Thank you! I will try increasing the sampling ratio and see what the results are.

Once again, I truly appreciate all the help!

1. No, each behavior is not present in the file. I only included it in the annotation file. I was not sure what the structure of the annotation file should look like.

Okay, this is important. The data set needs to contain at least a few examples of each behavior. For your case that means that in the single annotation file you have, you need examples of Bilateral Forelimb Reach, Bilateral Forelimb Retract, and Forelimb Oppose.

If that is not the case, you need to remove them from your list of selected classes. You can do this by removing it from the config file or recreating the project and deselecting the behavior not present in the Data Import step. You don't need to alter the original annotation file.

Example: If Bilateral Forelimb Reach has no examples. The new list should look like this: CLASSES = Bilateral Forelimb Retract, Forelimb Oppose, other

2. Yes, the 2.5 seconds is on purpose. From my understanding, the minimum duration is set with respect to how long it takes for a feature to occur. In our dataset, all features were occurring within that window of time. I think my understanding of the minimum duration might be incorrect.

To reiterate, the minimum duration is the resolution at which you want the features to be extracted. In our paper, we use the 10percentile of the length of all behavior occurrences as a basis. I don't know if this is 2.5 sec in your case, but I still recommend going lower for now (e.g., 600 ms).

I assume that with a 2.5-second window size, resolving the transitions between behaviors will be hard. Also, if you have behavioral bouts that are considerably smaller, they will be smoothed over by the window.

This is the error I get after using the same dataset but setting the sampling ratio to 0.1.

I assume this is because you are still not reaching at least 1 sample per behavior (because one class doesn't have examples). Can you send me a screenshot of the Active Learning step where you set the sampling ratio? The blue info text underneath indicates how many samples on average, will be available for the baseline.

mitras1210 commented 1 year ago

Thank you!

  1. Would it be possible to get an example of what the annotations file is supposed to look like? I think my input for the annotations file is not per the format that ASOiD requires, and thus I continue to various errors.
  2. Here is a screenshot of the active learning step: Web capture_10-7-2023_9850_localhost
mitras1210 commented 1 year ago

This is what my current annotations file looks like. Is this the correct format for a DLC annotations file? The file is a binary file with time in 0.1 second intervals (10 Hz) and 0's indicate when the behavior is not occurring and 1's are when the behavior is occurring. This was derived using BORIS.

FrontLeft_0.1annotations.csv

JensBlack commented 1 year ago
1. Would it be possible to get an example of what the annotations file is supposed to look like? I think my input for the annotations file is not per the format that ASOiD requires, and thus I continue to various errors.

The annotation file you posted should be fine. The format is correct.

To reiterate: Behaviors should be exclusive meaning only one column should be 1 at a time (all others 0). Any rows that are all "0" (no behavior present) will be turned into "other" during import. Additionally, all behaviors that you deselect during the data import step are turned to "other" as well.

JensBlack commented 1 year ago
2. Here is a screenshot of the active learning step:
   ![Web capture_10-7-2023_9850_localhost](https://user-images.githubusercontent.com/135264264/252367453-b02db332-5f7c-4f6b-ad64-23256e52cac3.jpeg)

Okay, I see the issue now. The blue info text gives you the average number of samples per cross-validation split (the parameter you selected during feature extraction). It states what, on average, the number of samples for those classes is given the chosen ratio (0.1 aka 10%) for the baseline/iter 0 - i.e., the beginning of active learning.

Bilateral Forelimb Retract: 0.27 samples

This means, that when we start active learning, which happens after you click "classify", the first iteration is failing to get any samples for that class. In your case, this is happening for all classes (see blue info text).

What to do?

You can either further increase the ratio until you get more than 1 sample per class

or

decrease the window size during feature extraction. That way, you will automatically get more samples.

Disclaimer: I assume this is a test case scenario where you quickly wanted to test A-SOiD with a single file. Therefore, I want to state that this is the result of a combination of little data (1 file), a large feature window size (2.5 sec!), and a small initial ratio for active learning (less than 1 sample per class). In most cases, this should not appear at all.

BUT! I am very grateful that you are testing the edges of our app, and I will include some additional warnings in a future update based on your experience. Thank you!

mitras1210 commented 1 year ago

Thank you so much for your response! I truly appreciate all your help. I just one other quick question. I had tried running the active learning step by adjusting the window size in feature extraction to 0.1 seconds and changing the sampling ratio to 0.15 so that all classes have at least 1 sample. This is the following error I am getting:

ValueError: operands could not be broadcast together with shapes (4,) (3,) Traceback: File "C:\Users\User\anaconda3\envs\asoid\lib\site-packages\streamlit\scriptrunner\script_runner.py", line 557, in _run_script exec(code, module.dict) File "C:\Users\User\anaconda3\envs\asoid\lib\site-packages\asoid\app.py", line 332, in main() File "C:\Users\User\anaconda3\envs\asoid\lib\site-packages\asoid\app.py", line 328, in main application_function() File "C:\Users\User\anaconda3\envs\asoid\lib\site-packages\asoid\apps\C_auto_active_learning.py", line 128, in main rf_classifier.main() File "C:\Users\User\anaconda3\envs\asoid\lib\site-packages\asoid\utils\auto_active_learning.py", line 549, in main self.base_classification() File "C:\Users\User\anaconda3\envs\asoid\lib\site-packages\asoid\utils\auto_active_learning.py", line 326, in base_classification self.show_subsampled_performance() File "C:\Users\User\anaconda3\envs\asoid\lib\site-packages\asoid\utils\auto_active_learning.py", line 281, in show_subsampled_performance mean_scores2beat = np.mean(np.mean(self.all_f1_scores, axis=0), axis=0) File "<__array_function__ internals>", line 180, in mean File "C:\Users\User\anaconda3\envs\asoid\lib\site-packages\numpy\core\fromnumeric.py", line 3474, in mean return _methods._mean(a, axis=axis, dtype=dtype, File "C:\Users\User\anaconda3\envs\asoid\lib\site-packages\numpy\core_methods.py", line 179, in _mean ret = umr_sum(arr, axis, dtype, out, keepdims, where=where)

Could you please help me out with this? I am also attaching the details of the config.ini file and a screenshot of the active learning step for reference. image

PROJECT_TYPE = DeepLabCut PROJECT_NAME = Run_11 PROJECT_PATH = Z:\SrishtiWorkspace\ASOiD Data FRAMERATE = 200 KEYPOINTS_CHOSEN = l_wrist, r_wrist, l_hand, r_hand, l_pink_knuck, l_pink_nail, l_ring_knuck, l_ring_nail, l_mid_knuck, l_mid_nail, l_point_kuck, l_point_nail, l_stump, r_pink_knuck, r_pink_nail, r_ring_knuck, r_ring_nail, r_mid_knuck, r_mid_nail, r_point_kuck, r_point_nail, r_stump EXCLUDE_OTHER = False FILE_TYPE = csv INDIVIDUALS_CHOSEN = single animal CLASSES = Bilateral Forelimb Reach, Bilateral Forelimb Retract, Forelimb Oppose, other MULTI_ANIMAL = False

[Data] DATA_INPUT_FILES = A61M1-4-awake_DLC_3D(new).csv LABEL_INPUT_FILES = FrontLeft_0.1annotations.csv ROOT_PATH = None

[Processing] ITERATION = 0 N_SHUFFLED_SPLITS = 10 MIN_DURATION = 0.10 TRAIN_FRACTION = 0.15000000000000002 MAX_ITER = 100 MAX_SAMPLES_ITER = 20 N_SHUFFLED_SPLIT = 10

Once again, thank you so much for all the help. I will be trying to run the app with more data and see if what the results of that process is.

JensBlack commented 1 year ago

Can you update to the newest version of ASOiD? I recently caught a bug with never versions of NumPy that might be part of the issue.

Then, please increase the initial ratio to 0.6 (to verify it works at all). After confirming that it works, you can lower the initial ratio again. But i want to verify that this is not a problem of low samples again. With an average of 1 sample across splits, you might have a split with 0 samples in there, which would explain the wrong shapes.

mitras1210 commented 1 year ago

I have updated ASOiD.

Additionally, I set the sampling ratio to 0.6 to verify if it works and I got the same error.

ValueError: operands could not be broadcast together with shapes (4,) (3,) Traceback: File "C:\Users\User\anaconda3\envs\asoid\lib\site-packages\streamlit\scriptrunner\script_runner.py", line 557, in _run_script exec(code, module.dict) File "C:\Users\User\anaconda3\envs\asoid\lib\site-packages\asoid\app.py", line 332, in main() File "C:\Users\User\anaconda3\envs\asoid\lib\site-packages\asoid\app.py", line 328, in main application_function() File "C:\Users\User\anaconda3\envs\asoid\lib\site-packages\asoid\apps\C_auto_active_learning.py", line 128, in main rf_classifier.main() File "C:\Users\User\anaconda3\envs\asoid\lib\site-packages\asoid\utils\auto_active_learning.py", line 549, in main self.base_classification() File "C:\Users\User\anaconda3\envs\asoid\lib\site-packages\asoid\utils\auto_active_learning.py", line 326, in base_classification self.show_subsampled_performance() File "C:\Users\User\anaconda3\envs\asoid\lib\site-packages\asoid\utils\auto_active_learning.py", line 281, in show_subsampled_performance mean_scores2beat = np.mean(np.mean(self.all_f1_scores, axis=0), axis=0) File "<__array_function__ internals>", line 180, in mean File "C:\Users\User\anaconda3\envs\asoid\lib\site-packages\numpy\core\fromnumeric.py", line 3474, in mean return _methods._mean(a, axis=axis, dtype=dtype, File "C:\Users\User\anaconda3\envs\asoid\lib\site-packages\numpy\core_methods.py", line 179, in _mean ret = umr_sum(arr, axis, dtype, out, keepdims, where=where)

JensBlack commented 1 year ago

Thank you for your patience. I still think this is an issue with your training data. We included some changes in the next update to inform users about the minimum data requirements.

For your case, I'd recommend adding more files (if available) to see whether the issue persists with more data. Let me know if this works for you!