bahelms / chess_vision

Chessboard image recognition
MIT License
3 stars 0 forks source link

Using Axon for this problem #6

Open meanderingstream opened 10 months ago

meanderingstream commented 10 months ago

Barrett,

I attended your talk at ElixirConf. It was one of the talks that I continued to think about afterwards. I think this is a good achievable "business" problem that could be used to demonstrate doing ML in Elixir well. Board recognition is a web capable problem, image based, and not real-time focused. I believe that solving a general chess board recognition problem can be done well with a more Axon focused approach and I have some experience that can help.

I have a blog, AlongTheAxon.com, where I blog about experiments with Axon capabilities. The blog is my attempt to "teach" ElixirML aspects to the community. I'm currently focused on the ElixirFashionML Challenge. Some of the aspects I'm working toward with the challenge would be very useful to this problem. With the challenge, I'm bringing key aspects taught in the Fast.ai course to Elixir. Pairing with you would probably bring in two, new to Elixir, modeling capabilities to the community: Single key point recognition and multi-class classification.

As far as my experience, I had a failed side-hustle where we were processing smart camera feeds, in Elixir, and running through a PyTorch image recognition process. The business goal was to recognize bed exits in a hospital setting. I only bring that up to indicate that I have some practical experience in building Elixir ML products so I have some skills/experience that can help.

My hope is that by pairing with you, we could provide some useful lessons learned about doing an ML project for "real". Even the product trade-offs can help the Elixir community think about their dream projects and help our community focus on what is achievable. I recognize that you may not be interested, but I wanted to offer. I think it would be a fun learning experience for both of us and the community.

If interested, my email is alongtheaxon@gmail.com

meanderingstream commented 10 months ago

The basic idea is:

Train a multi-class classifier to verify whether the uploaded image is a chess board or not.

Train 4 key point models that recognize the center of lower left square, lower right square, upper right, upper left. Kind of a waste to only do one keypoint at a time, but it is a relatively easy translation of the key point model head from Fast.ai. The four corners provide the square pixel size in the x and y dimension when dividing the x and y pixel differences by 7.

Gather a balanced set of images from several books. For some subset of the images, use an opensource image annotation tool to hand determine the center of the four corner squares and capture where the pieces are for each board.

Use the square pixel size to split the image into each square. You may choose to overlap the square images some. It shouldn't hurt the model. Now you have a set of images with one square with or without a piece. The dataset should be balanced across all of the ways books present a chess board.

Feed the spit up images into a training loop with a pre-trained ResNext18 or ResNext34 and fine tune the model, ala Sean's book. Use the techniques that will be identified by the ElixirFashionML Challenge to prevent overfitting and stronger training approaches than Sean's book. Pretrained, with fine tuning, is more computationally efficient to train.

I discussed a capturing hand data of a subset of the board images. The next technique is use the model to make predictions on the unlabeled images and QC review the accuracy, something like overlaying an image of the prediction, either dots or something like letters for the pieces. Use or build a tool to show the images to a user for quick review as accurate or need to hand annotate. Fix the annotations for those images that failed the QC. Add the images to the training set, retrain. This technique is called Active Learning. Iterate until images are annotated or not rejecting any images in QC.

In "production" by running the models on the CPU and NxServing, the deployment could be run economically on a site like fly.io.