DunnBC22 / Vision_Audio_and_Multimodal_Projects

This repository includes all computer vision, audio, document AI, and multimodal projects.
36 stars 10 forks source link
audio-classification computer-vision document-ai multimodal-deep-learning object-detection optical-character-recognition transfer-learning transformers

Computer Vision, Audio, & Multimodal Projects

This repository houses both semi-structured and non-structured projects that both were not completed using Spark and are not Natural Language (NLP) projects.

Binary Image Classification (Computer Vision) | Project Name | Accuracy | F1-Score | Precision | Recall | | :----------: | :----------: | :----------: | :----------: | :----------: | | [Bart vs Homer](https://github.com/DunnBC22/Vision_Audio_and_Multimodal_Projects/blob/main/Computer%20Vision/Image%20Classification/Binary%20Classification/Bart%20vs%20Homer/Bart_vs_Homer_Image_clf_ViT.ipynb) | 0.9863 | 0.9841 | 0.9688 | 1.0 | | [Brain Tumor MRI Images](https://github.com/DunnBC22/Vision_Audio_and_Multimodal_Projects/blob/main/Computer%20Vision/Image%20Classification/Binary%20Classification/Brain%20Tumor%20MRI%20Images/brain_tumor_MRI_Images_ViT.ipynb) | 0.9216 | 0.9375 | 0.8824 | 1.0 | | [COVID19 Lung CT Scans](https://github.com/DunnBC22/Vision_Audio_and_Multimodal_Projects/blob/main/Computer%20Vision/Image%20Classification/Binary%20Classification/COVID19%20Lung%20CT%20Scans/COVID19_Lung_CT_Scans_ViT.ipynb) | 0.94 | 0.9379 | 0.9855 | 0.8947 | | [Car or Motorcycle](https://github.com/DunnBC22/Vision_Audio_and_Multimodal_Projects/blob/main/Computer%20Vision/Image%20Classification/Binary%20Classification/Car%20or%20Motorcycle/Car_or_Motorcycle_ViT.ipynb) | 0.9938 | 0.9939 | 0.9951 | 0.9927 | | [Dogs or Cats Image Classification](https://github.com/DunnBC22/Vision_Audio_and_Multimodal_Projects/blob/main/Computer%20Vision/Image%20Classification/Binary%20Classification/Dogs%20or%20Cats%20Image%20Classification/Dog_v_Cat_ViT.ipynb) | 0.99 | 0.9897 | 0.9885 | 0.9909 | | [Male or Female Eyes](https://github.com/DunnBC22/Vision_Audio_and_Multimodal_Projects/blob/main/Computer%20Vision/Image%20Classification/Binary%20Classification/Male%20or%20Female%20Eyes/are_they_male_or_female_eyes_ViT.ipynb) | 0.9727 | 0.9741 | 0.9818 | 0.9666 | | [Breast Histopathology Image Classification](https://github.com/DunnBC22/Vision_Audio_and_Multimodal_Projects/blob/main/Computer%20Vision/Image%20Classification/Binary%20Classification/Breast%20Histopathology%20Images/Breast_Histopathology_Images_Using_ViT.ipynb) | 0.8202 | 0.8151 | 0.8141 | 0.8202 |
Multiclass & Multilabel Image Classification

Multiclass Image Classification

| Project Name | Accuracy | Macro F1-Score | Macro Precision | Macro Recall | Best Algorithm | | :----------: | :----------: | :----------: | :----------: | :----------: | :----------: | | [Brain Tumors Image Classification[^1]](https://github.com/DunnBC22/Vision_Audio_and_Multimodal_Projects/tree/main/Computer%20Vision/Image%20Classification/Multiclass%20Classification/Brain%20Tumors%20Image%20Classification%20Comparison) | 0.8198 | 0.8054 | 0.8769 | 0.8149 |Vision Transformer (ViT) | | [Diagnoses from Colonoscopy Images](https://github.com/DunnBC22/Vision_Audio_and_Multimodal_Projects/blob/main/Computer%20Vision/Image%20Classification/Multiclass%20Classification/Diagnoses%20from%20Colonoscopy%20Images/diagnosis_from_colonoscopy_image_ViT.ipynb) | 0.9375 | 0.9365 | 0.9455 | 0.9375 | - | | [Human Activity Recognition](https://github.com/DunnBC22/Vision_Audio_and_Multimodal_Projects/blob/main/Computer%20Vision/Image%20Classification/Multiclass%20Classification/Human%20Activity%20Recognition/ViT-Human%20Action_Recogniton.ipynb) | 0.8381 | 0.8394 | 0.8424 | 0.839 | - | | [Intel Image Classification](https://github.com/DunnBC22/Vision_Audio_and_Multimodal_Projects/blob/main/Computer%20Vision/Image%20Classification/Multiclass%20Classification/Intel%20Image%20Classification/Intel_ViT.ipynb) | 0.9487 | 0.9497 | 0.9496 | 0.95 | - | | [Landscape Recognition](https://github.com/DunnBC22/Vision_Audio_and_Multimodal_Projects/blob/main/Computer%20Vision/Image%20Classification/Multiclass%20Classification/Landscape%20Recognition/Landscape_Recognition_ViT.ipynb) | 0.8687 | 0.8694 | 0.8714 | 0.8687 | - | | [Lung & Colon Cancer](https://github.com/DunnBC22/Vision_Audio_and_Multimodal_Projects/blob/main/Computer%20Vision/Image%20Classification/Multiclass%20Classification/Lung%20%26%20Colon%20Cancer/Lung_and_colon_cancer_ViT.ipynb) | 0.9994 | 0.9994 | 0.9994 | 0.9994 | - | | [Mango Leaf Disease Dataset](https://github.com/DunnBC22/Vision_Audio_and_Multimodal_Projects/blob/main/Computer%20Vision/Image%20Classification/Multiclass%20Classification/Mango%20Leaf%20Disease%20Dataset/Mango_Leaf_Disease_ViT.ipynb) | 1.0 | 1.0 | 1.0 | 1.0 | - | | [Simpsons Family Images](https://github.com/DunnBC22/Vision_Audio_and_Multimodal_Projects/blob/main/Computer%20Vision/Image%20Classification/Multiclass%20Classification/Simpsons%20Family%20Images/Simpsons_family_with_hf_ViT.ipynb) | 0.953 | 0.9521 | 0.9601 | 0.9531 | - | | [Vegetable Image Classification](https://github.com/DunnBC22/Vision_Audio_and_Multimodal_Projects/blob/main/Computer%20Vision/Image%20Classification/Multiclass%20Classification/Vegetable%20Image%20Classification/Vegetables_ViT.ipynb) | 1.0 | 1.0 | 1.0 | 1.0 | - | | [Weather Images](https://github.com/DunnBC22/Vision_Audio_and_Multimodal_Projects/blob/main/Computer%20Vision/Image%20Classification/Multiclass%20Classification/Weather%20Images/Weather_Images_ViT.ipynb) | 0.934 | 0.9372 | 0.9398 | 0.9354 | - | | [Hyper Kvasir Labeled Image Classification](https://github.com/DunnBC22/Vision_Audio_and_Multimodal_Projects/blob/main/Computer%20Vision/Image%20Classification/Multiclass%20Classification/Hyper%20Kvasir%20Labeled%20Images/Hyper_Kvasir_Labeled_Images_Using_ViT.ipynb) | 0.8756 | 0.5778 | 0.5823 | 0.5746 | - |

Multilabel Image Classification

| Project Name | Subset Accuracy | F1 Score | ROC AUC | | :----------: | :----------: | :----------: | :----------: | | [Futurama - ML Image CLF](https://github.com/DunnBC22/Vision_Audio_and_Multimodal_Projects/blob/main/Computer%20Vision/Image%20Classification/Multilabel%20Classification/Futurama%20Screenshots/Futurama%20-%20ML%20Image%20CLF.ipynb) | 0.9672 | 0.9818 | 0.9842 |
Object Detection (Computer Vision) | Project Name | Avg. Precision[^3] | Avg. Recall[^4] | | :----------: | :----------: | :----------: | | [License Plate Object Detection](https://github.com/DunnBC22/Vision_Audio_and_Multimodal_Projects/blob/main/Computer%20Vision/Object%20Detection/License%20Plate%20Object%20Detection/License%20Plate%20Object%20Detection.ipynb) | 0.513 | 0.617 | | [Pedestrian Object Detection](https://github.com/DunnBC22/Vision_Audio_and_Multimodal_Projects/blob/main/Computer%20Vision/Object%20Detection/Pedestrian%20Object%20Detection/Pedestrian%20Detection-Object%20Detection%20-%205%20epochs.ipynb) | 0.560 | 0.745 | | [ACL X-Rays](https://github.com/DunnBC22/Vision_Audio_and_Multimodal_Projects/tree/main/Computer%20Vision/Object%20Detection/ACL%20X-Rays) | 0.09 | 0.308 | | [Abdomen MRIs](https://github.com/DunnBC22/Vision_Audio_and_Multimodal_Projects/blob/main/Computer%20Vision/Object%20Detection/Abdomen%20MRIs%20Object%20Detection/Abdomen_MRI_Object_Detection_YOLOS.ipynb) | 0.453 | 0.715 | | [Axial MRIs](https://github.com/DunnBC22/Vision_Audio_and_Multimodal_Projects/blob/main/Computer%20Vision/Object%20Detection/Axial%20MRIs/Axial_MRIs_Object_Detection_YOLOS.ipynb) | 0.284 | 0.566 | | [Blood Cell Object Detection](https://github.com/DunnBC22/Vision_Audio_and_Multimodal_Projects/blob/main/Computer%20Vision/Object%20Detection/Blood%20Cell%20Object%20Detection/Blood_Cell_Object_Detection_YOLOS.ipynb) | 0.344 | 0.448 | | [Brain Tumors](https://github.com/DunnBC22/Vision_Audio_and_Multimodal_Projects/blob/main/Computer%20Vision/Object%20Detection/Brain%20Tumors/Brain_Tumor_m2pbp_Object_Detection_YOLOS.ipynb) | 0.185 | 0.407 | | [Cell Tower Object Detection](https://github.com/DunnBC22/Vision_Audio_and_Multimodal_Projects/blob/main/Computer%20Vision/Object%20Detection/Cell%20Tower%20Object%20Detection/Cell%20Tower%20Detection%20YOLOS.ipynb) | 0.287 | 0.492 | | [Stomata Cells](https://github.com/DunnBC22/Vision_Audio_and_Multimodal_Projects/blob/main/Computer%20Vision/Object%20Detection/Stomata%20Cells/Stomata_Cells_Object_Detection_YOLOS.ipynb) | 0.340 | 0.547 | | [Excavator Object Detection](https://github.com/DunnBC22/Vision_Audio_and_Multimodal_Projects/blob/main/Computer%20Vision/Object%20Detection/Excavator%20Object%20Detection/Version%201%20(Better%20Results)/Excavator%20Detector%20-%20Object%20Detection.ipynb) | 0.386 | 0.748 | | [Forklift Object Detection](https://github.com/DunnBC22/Vision_Audio_and_Multimodal_Projects/tree/main/Computer%20Vision/Object%20Detection/Forklift%20Object%20Detection) | 0.136 | 0.340 | | [Hard Hat Object Detection](https://github.com/DunnBC22/Vision_Audio_and_Multimodal_Projects/blob/main/Computer%20Vision/Object%20Detection/Hard%20Hat%20Detection/Hard_Hat_Object_Detection_YOLOS.ipynb) | 0.346 | 0.558 | | [Liver Disease Object Detection](https://github.com/DunnBC22/Vision_Audio_and_Multimodal_Projects/blob/main/Computer%20Vision/Object%20Detection/Liver%20Disease%20Object%20Detection/Liver_Disease_Detection_YOLOS.ipynb) | 0.254 | 0.552 | * There are other Object Detection projects posted in the 'Trained, But Not To Standard' subdirectory. Basically, the code is completed, but due to constraints, it would take an unreasonably long time to train them. That said, the metrics are not the greatest for them.
Image Segmentation (Computer Vision) | Project Name | Mean IoU | Mean Accuracy | Overall Accuracy | Use PEFT? | | :----------: | :----------: | :----------: | :----------: | :----------: | | [Carvana Image Modeling](https://github.com/DunnBC22/Vision_Audio_and_Multimodal_Projects/blob/main/Computer%20Vision/Image%20Segmentation/Carvana%20Image%20Masking/Carvana%20Image%20Masking%20-%20Image%20Segmentation%20with%20LoRA.ipynb) | 0.9917 | 0.9962 | 0.9972 | Yes | | [Dominoes](https://github.com/DunnBC22/Vision_Audio_and_Multimodal_Projects/blob/main/Computer%20Vision/Image%20Segmentation/Dominoes/Fine-Tuning%20-%20Dominoes%20-%20Image%20Segmentation%20with%20LoRA.ipynb) | 0.9198 | 0.9515 | 0.9778 | Yes | | [CMP Facade (V2)](https://github.com/DunnBC22/Vision_Audio_and_Multimodal_Projects/blob/main/Computer%20Vision/Image%20Segmentation/CMP%20Facade/Version%201%20(Better%20Results)/SegFormer%20-%20CMP%20Facade%20-%20Image%20Segmentation%20with%20LoRA%20V2.ipynb) | 0.3102 | 0.4144 | 0.6267 | Yes | * There are other Image Segmentation projects posted in the 'Trained, But Not To Standard' subdirectory. Basically, the code is completed, but due to constraints, it would take an unreasonably long time to train them. That said, the metrics are not the greatest for them.
Document AI Projects

Multiclass Classification

| Project Name | Accuracy | Macro F1 Score | Macro Precision | Macro Recall | | :---: | :---: | :---: | :---: | :---: | | [Document Classification - Desafio_1](https://github.com/DunnBC22/Vision_Audio_and_Multimodal_Projects/blob/main/Document%20AI/Multiclass%20Classification/Document%20Classification%20-%20Desafio%201/Document%20Classification%20-%20Desafio%201.ipynb) | 0.9865 | 0.9863 | 0.9870 | 0.9861 | | [Document Classification RVL-CDIP](https://github.com/DunnBC22/Vision_Audio_and_Multimodal_Projects/blob/main/Document%20AI/Multiclass%20Classification/Document%20Classification%20-%20RVL-CDIP/Document%20Classification%20-%20RVL-CDIP.ipynb) | 0.9767 | 0.9154 | 0.9314 | 0.9019 | | [Real World Documents Collections](https://github.com/DunnBC22/Vision_Audio_and_Multimodal_Projects/blob/main/Document%20AI/Multiclass%20Classification/Real%20World%20Documents%20Collections/Real%20World%20Documents%20Collections.ipynb) | 0.767 | 0.7704 | 0.7767 | 0.7707 | | [Real World Documents Collections_v2](https://github.com/DunnBC22/Vision_Audio_and_Multimodal_Projects/blob/main/Document%20AI/Multiclass%20Classification/Real%20World%20Documents%20Collections/Real%20World%20Documents%20Collections_v2.ipynb) | 0.826 | 0.8242 | 0.8293 | 0.8237 | | [Tobacco-Related Documents](https://github.com/DunnBC22/Vision_Audio_and_Multimodal_Projects/blob/main/Document%20AI/Multiclass%20Classification/Tobacco-Related%20Documents/Tobacco%20Dataset%20%26%20DiT%20Transformer%20Project.ipynb) | 0.7532 | 0.722 | - | - | | [Tobacco-Related Documents_v2](https://github.com/DunnBC22/Vision_Audio_and_Multimodal_Projects/blob/main/Document%20AI/Multiclass%20Classification/Tobacco-Related%20Documents/Tobacco%20Dataset%20%26%20DiT%20Transformer%20Project_v2.ipynb) | 0.8666 | 0.8308 | - | - | | [Tobacco-Related Documents_v3](https://github.com/DunnBC22/Vision_Audio_and_Multimodal_Projects/blob/main/Document%20AI/Multiclass%20Classification/Tobacco-Related%20Documents/Tobacco%20Dataset%20%26%20DiT%20Transformer%20Project_v3.ipynb) | 0.9419 | 0.9278 | - | - |
Audio Projects | Project Name | Project Type | | :---: | :---: | | [Vinyl Scratched or Not](https://github.com/DunnBC22/Vision_Audio_and_Multimodal_Projects/blob/main/Audio-Projects/Classification/Vinyl%20Scratched%20or%20Not.ipynb) | Binary Audio Classification | | [Audio-Drum Kit Sounds](https://github.com/DunnBC22/Vision_Audio_and_Multimodal_Projects/blob/main/Audio-Projects/Classification/Audio-Drum_Kit_Sounds.ipynb) | Multiclass Audio Classification | | [Speech Emotion Detection](https://github.com/DunnBC22/Vision_Audio_and_Multimodal_Projects/blob/main/Audio-Projects/Emotion%20Detection/Speech%20Emotion%20Detection/Speech%20Emotion%20Recognition-wav2vec2-base.ipynb) | Emotion Detection | | [Toronto Emotional Speech Set (TESS)](https://github.com/DunnBC22/Vision_Audio_and_Multimodal_Projects/blob/main/Audio-Projects/Emotion%20Detection/Toronto%20Emotional%20Speech%20Set%20(TESS)/Toronto%20Emotional%20Speech%20Set%20(TESS).ipynb) | Emotion Detection | | [ASR Speech Recognition Dataset](https://github.com/DunnBC22/Vision_Audio_and_Multimodal_Projects/blob/main/Audio-Projects/Automatic%20Speech%20Recognition/Speech%20Recognition%20Dataset/ASR_Speech_Recognition_Dataset.ipynb) | Automatic Speech Recognition |
Optical Character Recognition Projects | Project Name | CER[^2] | | :---: | :---: | | [20,000 Synthetic Samples Dataset](https://github.com/DunnBC22/Vision_Audio_and_Multimodal_Projects/tree/main/Optical%20Character%20Recognition%20(OCR)/20%2C000%20Synthetic%20Samples%20Dataset) | 0.0029 | | [Captcha](https://github.com/DunnBC22/Vision_Audio_and_Multimodal_Projects/blob/main/Optical%20Character%20Recognition%20(OCR)/Captcha/OCR_captcha.ipynb) | 0.0075 | | [Handwriting Recognition (v1)](https://github.com/DunnBC22/Vision_Audio_and_Multimodal_Projects/blob/main/Optical%20Character%20Recognition%20(OCR)/Handwriting%20Recognition/OCR_handwriting-recognition.ipynb) | 0.0533 | | [Handwriting Recognition (v2)](https://github.com/DunnBC22/Vision_Audio_and_Multimodal_Projects/blob/main/Optical%20Character%20Recognition%20(OCR)/Handwriting%20Recognition/Handwriting%20Recognition_v2/Mini%20Handwriting%20OCR%20Project.ipynb) | 0.0360 | | [OCR License Plate Text Recognition](https://github.com/DunnBC22/Vision_Audio_and_Multimodal_Projects/blob/main/Optical%20Character%20Recognition%20(OCR)/OCR%20License%20Plates/OCR_license_plate_text_recognition.ipynb) | 0.0368 | | [Tesseract E13B](https://github.com/DunnBC22/Vision_Audio_and_Multimodal_Projects/blob/main/Optical%20Character%20Recognition%20(OCR)/Tesseract%20MICR%20(E15B%20Dataset)/TrOCR-e13b%20-%20tesseractMICR.ipynb) | 0.0036 | | [Tesseract CMC7](https://github.com/DunnBC22/Vision_Audio_and_Multimodal_Projects/blob/main/Optical%20Character%20Recognition%20(OCR)/Tesseract%20MICR%20(CMC7%20Dataset)/TrOCR_cmc7_tesseractMICR.ipynb) | 0.0050 |


Footnotes:

[^1]: This project is part of a transformer comparison.

[^2]: CER stands for Character Error Rate.

[^3]: Average Precision (AP) @[IoU=0.50:0.95 | area=all | maxDets=100]

[^4]: Average Recall (AR) @[IoU=0.50:0.95 | area=all | maxDets=100]