Maximax67 / LoRA-Dataset-Automaker

An advanced Jupyter Notebook for creating precise datasets tailored to stable Diffusion LoRa training. Automate face detection, similarity analysis, and curation, with streamlined exporting, utilizing cutting-edge models and functions.
MIT License
23 stars 1 forks source link

Find duplicates error #12

Open ryuji99 opened 5 days ago

ryuji99 commented 5 days ago

image

nrocka commented 1 day ago

@Maximax67 You might wanna have a look at this. This error occurs when trying to decode corrupted images. It will not skip the images but instead throw this error. I made a cell that will detect and delete corrupt / invalid images. I couldn't check the code for any issues or problems, as I'm not a dev at all and did this mostly with AI.

I pasted the cell over at pastebin, as I am unable to post the code in this comment with correct formatting (skill issue): https://pastebin.com/Edmx16vL

I'd be glad if you could implement this (or a similar fix) for this problem to your public notebook.

image

Maximax67 commented 1 day ago

Thank you. I am fixing this issue now. There is another bug with python dependencies. compute_embeddings method throws errors in console for all images. It looks like I need to roll back to an older version of one of the dependencies. I'm trying to fix that now.

The problem is in this line too: embeddings = dataset.compute_embeddings(model, batch_size=batch_size)