feature: Community knowledge base

marwenbk commented 9 months ago

suggestion: why should each client process every asset? we can gather knowledge to check for quick verification, and when a client sees a new image his device does the work and sends the result. It can also be a DNS server. Or a distributed ledger ( blockchain ) as a source of truth to avoid centralization? with this strategy, we can cluster the load to be able to classify content with high-accuracy models What do you think?

man2machine commented 9 months ago

Salam!

I'm not the creator of this extension, but I am one of its users that is following its development from a detection/AI standpoint. So my understanding does not represent the views of the developer @alganzory

The sheer number of images on the internet is on the order of trillions. There is no way that such a large amount of image content can be stored to cache the detection results. New images would appear every day, people have different device screen sizes, there are thousands of frames in video, etc. There are many factors contributing to the huge number of images. This is not generally how machine-learning based image recognition works.

Furthermore, running a server that can store the detection results from each client so that it can be re-used later in other clients (or the same client), would also require additional resources (that would need to be payed for), which is outside of the extensions running on each client browser.

alganzory commented 9 months ago

Salam!

I'm not the creator of this extension, but I am one of its users that is following its development from a detection/AI standpoint. So my understanding does not represent the views of the developer @alganzory

The sheer number of images on the internet is on the order of trillions. There is no way that such a large amount of image content can be stored to cache the detection results. New images would appear every day, people have different device screen sizes, there are thousands of frames in video, etc. There are many factors contributing to the huge number of images. This is not generally how machine-learning based image recognition works.

Furthermore, running a server that can store the detection results from each client so that it can be re-used later in other clients (or the same client), would also require additional resources (that would need to be payed for), which is outside of the extensions running on each client browser.

@man2machine said it better than I would have That being said, this made me think of enabling some sort of reporting for individual images that didn't get successfully detected, this bank of unsuccessful detections could then be used to train/retrain the models to be more accurate? For example, images of Sheiks or people wearing head covers usually get misclassified as women, if at any stage we train or retrain our own models I would surely want these to go into the training as I doubt that any of the famous models out there use these images for gender classification training or nsfw detection

man2machine commented 9 months ago

Yes, as @alganzory said, crowd sourcing detection results is definitely something that is possible from a technical standpoint. However it may be hard to get this to work since different people have different standards and there is potential for many labeling errors. But if such issues were overcome, then yes it is definitely possible.

Right now I think the priority is to find and potentially create better detection models, and improve the user interface/experience.

On Wed, Nov 8, 2023, 4:53 AM Mohamed Alganzory @.***> wrote:

Salam!

I'm not the creator of this extension, but I am one of its users that is following its development from a detection/AI standpoint. So my understanding does not represent the views of the developer @alganzory https://github.com/alganzory

The sheer number of images on the internet is on the order of trillions. There is no way that such a large amount of image content can be stored to cache the detection results. New images would appear every day, people have different device screen sizes, there are thousands of frames in video, etc. There are many factors contributing to the huge number of images. This is not generally how machine-learning based image recognition works.

Furthermore, running a server that can store the detection results from each client so that it can be re-used later in other clients (or the same client), would also require additional resources (that would need to be payed for), which is outside of the extensions running on each client browser.

@man2machine https://github.com/man2machine said it better than I would have That being said, this made me think of enabling some sort of reporting for individual images that didn't get successfully detected, this bank of unsuccessful detections could then be used to train/retrain the models to be more accurate? For example, images of Sheiks or people wearing head covers usually get misclassified as women, if at any stage we train or retrain our own models I would surely want these to go into the training as I doubt that any of the famous models out there use these images for gender classification training or nsfw detection

— Reply to this email directly, view it on GitHub https://github.com/alganzory/HaramBlur/issues/35#issuecomment-1801835185, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJVVQKH45NRYJ22EJP6JVCTYDN6FBAVCNFSM6AAAAAA7BZN526VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQMBRHAZTKMJYGU . You are receiving this because you were mentioned.Message ID: @.***>

marwenbk commented 9 months ago

well i just wanted to share my thoughts. you guys knows better of course

alganzory commented 9 months ago

@marwenbk thanks for your suggestion, please keep sharing your thoughts and support <3

alganzory / HaramBlur

feature: Community knowledge base #35