YttriLab / A-SOID

An active learning platform for expert-guided, data efficient discovery of behavior.
Other
55 stars 7 forks source link

Error during 'Discover'; missing data for behavior #72

Closed Mijar007 closed 8 months ago

Mijar007 commented 9 months ago

Describe the bug After generating the model, I tried the 'Discover' function. After preprocing the files, I pushed the 'Embed and Cluster Targeted Behavior' button and received the error below.

File "C:\ProgramData\anaconda3\envs\asoid\lib\site-packages\streamlit\runtime\scriptrunner\script_runner.py", line 535, in _run_script
    exec(code, module.__dict__)
File "C:\Users\Rabenstein\Python\A-SOID-main\asoid\app.py", line 328, in <module>
    main()
File "C:\Users\Rabenstein\Python\A-SOID-main\asoid\app.py", line 322, in main
    G_unsupervised_discovery.main(ri=ri, config=st.session_state['config'])
File "C:\Users\Rabenstein\Python\A-SOID-main\asoid\apps\G_unsupervised_discovery.py", line 394, in main
    pca_umap_hdbscan(target_behavior, annotation_classes, st.session_state['input_sav'],
File "C:\Users\Rabenstein\Python\A-SOID-main\asoid\apps\G_unsupervised_discovery.py", line 194, in pca_umap_hdbscan
    umap_embeddings[target_behav] = reducer.fit_transform(selected_feats_)
File "C:\ProgramData\anaconda3\envs\asoid\lib\site-packages\umap\umap_.py", line 2887, in fit_transform
    self.fit(X, y, force_all_finite)
File "C:\ProgramData\anaconda3\envs\asoid\lib\site-packages\umap\umap_.py", line 2354, in fit
    X = check_array(X, dtype=np.float32, accept_sparse="csr", order="C", force_all_finite=force_all_finite)
File "C:\ProgramData\anaconda3\envs\asoid\lib\site-packages\sklearn\utils\validation.py", line 967, in check_array
    raise ValueError(

Based on the 'Predict' results, I think the model could not find any datapoints in my 'Wiggle' group, creating an empty array during the discover step. If I exclude the 'Wiggle' group from the 'Embed and Cluster Targeted Behavior' step, it will not rise the error. If I afterwards reselect the 'Wiggling' group, it will show another error.

KeyError: 'Wiggle'
Traceback:
File "C:\ProgramData\anaconda3\envs\asoid\lib\site-packages\streamlit\runtime\scriptrunner\script_runner.py", line 535, in _run_script
    exec(code, module.__dict__)
File "C:\Users\Rabenstein\Python\A-SOID-main\asoid\app.py", line 328, in <module>
    main()
File "C:\Users\Rabenstein\Python\A-SOID-main\asoid\app.py", line 322, in main
    G_unsupervised_discovery.main(ri=ri, config=st.session_state['config'])
File "C:\Users\Rabenstein\Python\A-SOID-main\asoid\apps\G_unsupervised_discovery.py", line 473, in main
    behav_groups[target_behav],

I suggest to include something to catch empty groups during or after preprocessing. Then the user can be informed and/or can not select the empty groups for the embedding step.

Screenshots

Prediction results image

with 'Wiggle' group image

Without 'Wiggle' group image

Desktop (please complete the following information):

Project Config (please post the content of the corresponding config.ini file) [Project] PROJECT_TYPE = DeepLabCut PROJECT_NAME = Feb-28-2024_test4 PROJECT_PATH = C:\Users\Rabenstein/Desktop/asoid_output FRAMERATE = 30 KEYPOINTS_CHOSEN = Nose, Front_Right_1, Front_Right_2, Front_Left_1, Front_Left_2, Rear_Right_1, Rear_Right_2, Rear_Left_1, Rear_Left_2, Midline_Front, Midline_Center, Midline_Rear EXCLUDE_OTHER = False FILE_TYPE = csv INDIVIDUALS_CHOSEN = single animal CLASSES = Wiggle, other MULTI_ANIMAL = False IS_3D = False

[Data] DATA_INPUT_FILES = GOPR0612DLC_resnet50_Wiggle_TestJan26shuffle1_100000.csv, GOPR0613DLC_resnet50_Wiggle_TestJan26shuffle1_100000.csv, GOPR0614DLC_resnet50_Wiggle_TestJan26shuffle1_100000.csv, GOPR0615DLC_resnet50_Wiggle_TestJan26shuffle1_100000.csv, GOPR0616DLC_resnet50_Wiggle_TestJan26shuffle1_100000.csv, GOPR0617DLC_resnet50_Wiggle_TestJan26shuffle1_100000.csv, GOPR0618DLC_resnet50_Wiggle_TestJan26shuffle1_100000.csv, GOPR0619DLC_resnet50_Wiggle_TestJan26shuffle1_100000.csv, GOPR0620DLC_resnet50_Wiggle_TestJan26shuffle1_100000.csv, GOPR0621DLC_resnet50_Wiggle_TestJan26shuffle1_100000.csv LABEL_INPUT_FILES = GOPR0612.avi_No focal subject_mod.tsv, GOPR0613.avi_No focal subject_mod.tsv, GOPR0614.avi_No focal subject_mod.tsv, GOPR0615.avi_No focal subject_mod.tsv, GOPR0616.avi_No focal subject_mod.tsv, GOPR0617.avi_No focal subject_mod.tsv, GOPR0618.avi_No focal subject_mod.tsv, GOPR0619.avi_No focal subject_mod.tsv, GOPR0620.avi_No focal subject_mod.tsv, GOPR0621.avi_No focal subject_mod.tsv ROOT_PATH = None

[Processing] LLH_VALUE = 0.1 ITERATION = 0 MIN_DURATION = 0.1 TRAIN_FRACTION = 0.02 MAX_ITER = 100 MAX_SAMPLES_ITER = 20 CONF_THRESHOLD = 0.5 N_SHUFFLED_SPLIT = None

Best wishes, Michael

JensBlack commented 8 months ago

Thank you for reporting this. Indeed, because we use the trained classifier in this step and new data, we could end up with no data for a behavior. Good catch!

I will include an information for this in the next update.