YttriLab / A-SOID

An active learning platform for expert-guided, data efficient discovery of behavior.
Other
47 stars 8 forks source link

Creat New Dataset gives: local variable 'new_features' referenced before assignment #65

Closed kipkeller closed 4 months ago

kipkeller commented 5 months ago

Using WINDOWS, Firefox, ... worked with CAlms21 - as far as that goes (not very). Now trying with our own dataset. Since I really have no clue if anything here is correct, I include here my notes from the start:

Using our own data from Boris and DLC:

Make four directories within c:/users/.../A-SoiD …/BorisData …/DLCposeData …/DLCvideos …/output USING BORIS: After annotating an observation (video): Observations -> Export events -> as Behaviors Binary Table Select all Select all

Time constant = 0.1 sec as CSV check to be sure each file column #1 is 0:0.1: end save these to c:/users/.../A-SOiD/BorisData This will create a csv file for each chosen Boris annotation (each video) Using DLC: Copy the associated video for each observation to c:/users/.../A-SOiD/DLCvideos (easiest if you change their name to look similar to the Boris files, so Windows FileExplorer will alphabetize them equivalently) Copy the equivalent DLC csv files to c:/users/.../A-SoiD/DLCposeData These files must have NO NaNs and so will likely have to be massaged versions of the DLC output. Note that the first four rows (for multianimal models) of the DLC output files are ‘string’ formatted (whereas thereafter the data are ‘doubles’). ASOiD uses the data in rows 2-4 to get keypoints and animalIDs. Once you have massaged these copied data to remove the NaNs, again give them filenames that will easily alphabetize them with the Boris files. Using ASOiD: **Sure would help if there was any kind of documentation!** Open the Anaconda prompt as administrator conda activate asoid asoid app - this opens ASOiD in your default browser (about 1 minute) goto ‘Upload Data’ in the ‘Train’ window, select ‘DeepLabCut’ You will now want to drag and drop both ‘pose’ (DLC .csv) files and ‘annotation’ (Boris .csv) files here. The files need to be in the same order for each list – but note that this seems to only work reliably if you pick them one at a time alternating between the ‘pose’ and ‘annotation’ file-types (hence why you made them alphabetize similarly in the three directories). Note also that the file listings for each set appear twice on the app and are not always in the same order. Best to alternately choose each file – or at least one at a time. Not sure how many file-pairs is a good number. I put in about 21 pairs as they are fairly short (1-5 minute) videos. Note, that you could build (copy and rebuild) a config.ini file with a list of DATA_INPUT_FILES and LABEL_INPUT_FILES – this seems like the easiest way to go. In the ‘Config’ window: 200 frames/second for pose files 10 sample rate for annotation files Remove classes (annotations) you wish to exclude (especially rarely occurring ones, like ‘jump’) Check ‘Exclude other’ Check ‘multianimal project’ Remove animals you wish to exclude (I only used ‘mouse’ and removed ‘cricket’) Remove keypoints to exclude (e.g. those for the cricket) In the ‘Save’ window: Working directory: C:\Users\...\A-SOiD\output Filename prefix: test1 Double check everything Click “Create Project/Preprocess Goto ‘Extract Features’ (you must click this twice to get there) I just kept the defaults and clicked ‘Extract Features’ on this page Goto ‘Active Learning’ With too few videos or too rare annotation classes, this will throw an error ‘not enough available labels’ and suggest you go back to ‘data upload’ and de-select annotation classes with too few labels – but this doesn’t seem possible, so start over (again, you could edit the config.ini file outside of ASOiD and re-use it) I kept defaults and clicked ‘Train Classifier’ on this page GoTo ‘Refine Behaviors’ – not sure what to do here, or what the module is trying to do, but… I added one (~4 min) video and its corresponding pose file Kept default parameters and clicked on ‘Start frame extraction’ When this finished the response said: ‘Done. Type “R” to refresh’ – which I do not understand. I typed ‘r’ and it seemed to want another video and pose file pair. So I added another and clicked ‘Start frame extraction’ again … Moved on to ‘Create New Dataset’ Not sure what this module is supposed to do It lists both videos that were ‘refined’ and asks to ‘Select Refinement’ (I left this as is – listing both, since it is unclear if I should remove one of them). Then clicked on ‘Create ITERATION 1 training dataset’ This caused an error: UnboundLocalError: local variable 'new_features' referenced before assignment Traceback: File "C:\ProgramData\Anaconda3\envs\asoid\lib\site-packages\streamlit\runtime\scriptrunner\script_runner.py", line 535, in _run_script exec(code, module.__dict__) File "C:\Users\...\A-SOiD\asoid\app.py", line 317, in main() File "C:\Users\...\A-SOiD\asoid\app.py", line 301, in main E_create_new_training.main(ri=ri, config=st.session_state['config']) File "c:\users\...\a-soid\asoid\apps\E_create_new_training.py", line 117, in main create_new_training_features_targets(project_dir, selected_iter, new_features, new_targets) Tried again after removing one of the videos for refinement (leaving only one) – this gave the same error. Didn’t get beyond this. Here is the config.ini: [Project] PROJECT_TYPE = DeepLabCut PROJECT_NAME = test1 PROJECT_PATH = C:\Users\Kip\A-SOiD\output FRAMERATE = 200 KEYPOINTS_CHOSEN = nose, Rear, Lear, headbase, spine, tailbase EXCLUDE_OTHER = True FILE_TYPE = csv INDIVIDUALS_CHOSEN = mouse CLASSES = circle, groom, pursuit, rear, search, wall, other MULTI_ANIMAL = True [Data] DATA_INPUT_FILES = 2021-08-11_13-16-11_mouse-0180_DLC.csv, 2021-08-17_15-32-18_mouse-0180_DLC.csv, 2021-10-07_13-41-19_mouse-0601_DLC.csv, 2021-10-07_14-56-59_mouse-0528_DLC.csv, 2021-10-07_15-20-38_mouse-0602_DLC.csv, 2021-10-07_15-42-13_mouse-0600_DLC.csv, 2021-10-11_15-41-40_mouse-0602_DLC.csv, 2021-10-11_16-20-35_mouse-0599_DLC.csv, 2021-12-10_14-29-34_mouse-0599_DLC.csv, 2022-07-25_15-05-29_mouse-1099_DLC.csv, 2022-07-25_15-19-36_mouse-1099_DLC.csv, 2022-07-26_17-30-46_mouse-1128_DLC.csv, 2022-07-27_16-40-55_mouse-1128_DLC.csv, 2022-07-27_16-55-57_mouse-1128_DLC.csv, 2022-07-27_17-17-10_mouse-1127_DLC.csv, 2022-07-29_15-04-00_mouse-1127_DLC.csv, 2022-10-17_14-09-16_mouse-1148_DLC.csv, 2022-10-17_14-45-21_mouse-1144_DLC.csv, 2022-10-17_15-13-55_mouse-1141_DLC.csv, 2022-10-27_10-55-37_mouse-1144_DLC.csv, 2022-10-28_14-28-24_mouse-1148_DLC.csv LABEL_INPUT_FILES = 2021-08-11_13-16-11_mouse-0180_mouse.csv, 2021-08-17_15-32-18_mouse-0180_mouse.csv, 2021-10-07_13-41-19_mouse-0601_mouse.csv, 2021-10-07_14-56-59_mouse-0528_mouse.csv, 2021-10-07_15-20-38_mouse-0602_mouse.csv, 2021-10-07_15-42-13_mouse-0600_mouse.csv, 2021-10-11_15-41-40_mouse-0602_mouse.csv, 2021-10-11_16-20-35_mouse-0599_mouse.csv, 2021-12-10_14-29-34_mouse-0599_mouse.csv, 2022-07-25_15-05-29_mouse-1099_mouse.csv, 2022-07-25_15-19-36_mouse-1099_mouse.csv, 2022-07-26_17-30-46_mouse-1128_mouse.csv, 2022-07-27_16-40-55_mouse-1128_mouse.csv, 2022-07-27_16-55-57_mouse-1128_mouse.csv, 2022-07-27_17-17-10_mouse-1127_mouse.csv, 2022-07-29_15-04-00_mouse-1127_mouse.csv, 2022-10-17_14-09-16_mouse-1148_mouse.csv, 2022-10-17_14-45-21_mouse-1144_mouse.csv, 2022-10-17_15-13-55_mouse-1141_mouse.csv, 2022-10-27_10-55-37_mouse-1144_mouse.csv, 2022-10-28_14-28-24_mouse-1148_mouse.csv ROOT_PATH = None [Processing] LLH_VALUE = 0.1 ITERATION = 0 MIN_DURATION = 0.1 TRAIN_FRACTION = 0.03 MAX_ITER = 100 MAX_SAMPLES_ITER = 60 CONF_THRESHOLD = 0.5 N_SHUFFLED_SPLIT = None
JensBlack commented 5 months ago

Thank you for your indepth documentation of your progress. Given that our workflow wasn't clear to you from the app alone, I will use this opportunity to increase the level of detail of the workflow.

Concerning your issue

For clarity, the "Refine Dataset" step is an optional step to increase your classifier's performance on new unlabeled data by giving you the ability to do "manual active learning". Depending on your classifier's performance it might not be necessary.

The step requires uploading a pose estimation file and corresponding video, which will then use the latest active learning classifier to predict your behavior classes and give you a number of low-confidence samples to label.

The current way to select uploaded video/pose is to use the drop down menu and select the video name.

Because you unintentionally skipped doing that, the next step, "Create new Dataset," is not working.

Concerning Project setup:

If the label files have the same name or order as the pose files (and you have many), you can use the folder import to automatically sort them in the selection window.

You are right that the GUI component we use to do this can be tricky with long filenames. Unfortunately, it is a required step for the import of data from various sources given the current split pose/label files.

I would not recommend altering the configuration file. This can have some unforeseen side effects.

If you really want to do that this way, please also delete the files generated by the corresponding step, or redo the step, if possible, using the GUI. Otherwise, you might have data that was not properly updated.

I appreciate your insight and documentation; if there is any additional uncertainty, please report it.

kipkeller commented 4 months ago

Hi Jens, Thanks for looking into this. I am still lost. For "Refine Behaviors' - I did choose a (new) DLC-ed video and associated pose data - but now did it again (.mp4 and pose data) . Then I clicked '‘Start frame extraction’' and it extracted frames as before and put (about 8000) frames into a '...\videos\' directory and a 'refine_params.sav' file into a '...\iteration-0\' directory (as before). Am I supposed to do something with these frames and '.sav' file? If so, what? If I do nothing with them and continue on to 'Create New Dataset' it gives me the error I previously described.

Thanks again, kip


From: Jens Tillmann @.> Sent: Friday, February 2, 2024 1:21 AM To: YttriLab/A-SOID @.> Cc: kipkeller @.>; Author @.> Subject: Re: [YttriLab/A-SOID] Creat New Dataset gives: local variable 'new_features' referenced before assignment (Issue #65)

Thank you for your indepth documentation of your progress. Given that our workflow wasn't clear to you from the app alone, I will use this opportunity to increase the level of detail of the workflow.

Concerning your issue

For clarity, the "Refine Dataset" step is an optional step to increase your classifier's performance on new unlabeled data by giving you the ability to do "manual active learning". Depending on your classifier's performance it might not be necessary.

The step requires uploading a pose estimation file and corresponding video, which will then use the latest active learning classifier to predict your behavior classes and give you a number of low-confidence samples to label.

The current way to select uploaded video/pose is to use the drop down menu and select the video name.

Because you unintentionally skipped doing that, the next step, "Create new Dataset," is not working.

Concerning Project setup:

If the label files have the same name or order as the pose files (and you have many), you can use the folder import to automatically sort them in the selection window.

You are right that the GUI component we use to do this can be tricky with long filenames. Unfortunately, it is a required step for the import of data from various sources given the current split pose/label files.

I would not recommend altering the configuration file. This can have some unforeseen side effects.

If you really want to do that this way, please also delete the files generated by the corresponding step, or redo the step, if possible, using the GUI. Otherwise, you might have data that was not properly updated.

I appreciate your insight and documentation; if there is any additional uncertainty, please report it.

— Reply to this email directly, view it on GitHubhttps://github.com/YttriLab/A-SOID/issues/65#issuecomment-1923396115, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AHZUXWARWNWMMYZFQ7HYCKTYRSVY7AVCNFSM6AAAAABCTRAOKKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMRTGM4TMMJRGU. You are receiving this because you authored the thread.Message ID: @.***>

JensBlack commented 4 months ago

We extract the frames to create snippets of the full length video that represent each bout that needs refinement. This is the first step in the refinement process and you should be able to start refinement then in the same tab. Maybe the GUI is not clear enough? Could you send some screenshots with your next post to pinpoint were exactly you are right now. Thanks!

kipkeller commented 4 months ago

I am appending some screen shots here for each step (1-4. 5 is probably a step too far) along the way. I believe it is the step I label as "step4" ('Refine Behaviors') that you are suggesting I should do more with - but I am not sure what 'more' is. Thanks: Asoid1_Feb5 Asoid1b_Feb5 Asoid2_Feb5 Asoid3_Feb5 Asoid3b_Feb5

Asoid4_Feb5

Asoid4b_Feb5 Asoid5_Feb5

JensBlack commented 4 months ago

Okay, I think I know what is the issue. Thank you for being so thorough here.

The "refine behaviors" step needs you to refresh manually by pressing the 'R' key on your keyboard while A-SOiD is the active window (i.e., you haven't clicked anywhere else prior). This forces the app to reload and then give you the next substep (labelling videos). I will try to recreate this on my side and see if I can tweak the app to do this automatically.

Let me know if this solves your issue!

kipkeller commented 4 months ago

Thanks (but little progress): pressing 'r' (but not 'R') does progress to an updated screen: Asoid4b_Feb7

But then pressing 'predict labels and create example videos' leads to the error below: Asoid4c_Feb7

JensBlack commented 4 months ago

I am on it. I appreciate your patience. For now I'd say you can already increase the performance by increasing the number of iterations during automatic active learning and/or increase the number of samples per iteration.

The Manual refinement step is optional and the steps predict + discover work after you trained a model in the active learning step.

kipkeller commented 4 months ago

Hi Jens, This was helpful - and I will get to that in a moment. First, I increased the max iterations from 100 to 200 (but it stopped after about 115) and also increased the samples per iteration from 60 to 100 (no idea if these are reasonable values):

Asoid3_Feb13

Notice that the 'groom' barely moved. There may be only a few 'grooms' in the particular videos chosen. These annotations are really only my first shot at it to see if it's worth spending the time in Boris to do this better. Annotation is a learned sport.

I continued on as before, but then skipped (from refine and its error) to the 'predict' step (didn't realize I could do that). This gave me the following: Asoid6_Feb13

So this is helpful because it seems to show a problem with the pose CSV files - in the 'upload data' step Asoid correctly found the 'animals to include' as mouse and cricket (I excluded cricket for now) and the keypoints ('nose','Lear','Rear','headbase','spine','tailbase' - I excluded 'anteriorC' and 'posteriorC' as those belonging to the cricket). Here, in the 'predict' tab it is reading the CSV differently (or the CSV is wrong) and lists the keypoints as 'mouse' and 'cricket'. This could also explain the error in the earlier step ('refinement') - but I can only guess there. I will try to upload a truncated version of the pose CSV so you will see the header and first few lines of data: 2022-07-27_16-40-55_mouse-1128_DLC_DEMO.csv

Maybe I need to change this CSV file? I needed to smooth, interpolate and eliminate NaNs from the DLC output. I did this in MATLAB and maybe this caused a problem - although it looks ok to me (and asoid seemed to work fine with it in the initial steps). Thanks

JensBlack commented 4 months ago

The recent update should fix most of the encountered issues and increased the level of in-app documentation significantly.

Concerning your latest issue:

Class is not increasing in performance:

One potential reason is that the class does not have enough examples. An indication for this is also the equally bad performance from the oneshot classifier trained on all training data at once. However, your active learning parameters suggest that this class contains roughly the same number of samples than the others.

If you have more labeled examples of groom available, your best option will be to add them to the initial data.

You could also use the Refinement step to add examples of groom more efficiently. Here you would have to identify bouts that are misclassified and correctly assign them to groom. This might take some time though, because your classifier seems to completely miss out on groom so far.

Alternatively, you could remove the class in a new project and include "other" this time, which will result in a classifier that catches grooming and other - unlabeled - behaviors collectively.


I have a remaining question:

Copy the equivalent DLC csv files to c:/users/.../A-SoiD/DLCposeData These files must have NO NaNs and so will likely have to be massaged versions of the DLC output. Note that the first four rows (for multianimal models) of the DLC output files are ‘string’ formatted (whereas thereafter the data are ‘doubles’). ASOiD uses the data in rows 2-4 to get keypoints and animalIDs. Once you have massaged these copied data to remove the NaNs, again give them filenames that will easily alphabetize them with the Boris files.

AFAIK DLC outputs low confidence values and does not clean/remove data in their raw output. Maybe I missed a functionality. Can you elaborate on this? We are already doing likelihood filtering (DLC) and NaN interpolation (SLEAP), so if this is common, I am happy to extend this to DLC data.