Help Wanted: Creating Phone App

I'm at the point where I have a model that is qualitatively good at detecting poop. I need to run them through a quantitative metrics and start a leaderboard. The models will get better with time. At this point, I want to start the next phase of the project, but I've never built a phone app before. I could use help.

I envision the basic functionality of the app as such:

App has access to camera, and can take pictures / video
Runs captured images through prediction pipeline
Displays results to user as bounding boxes or heatmaps

To support distributed and offline environments, it would be nice if the model ran on a phone. However, for power reasons, it would also be nice if the app could connect to a backend service that made the predictions for it. That would likely save power at the cost of bandwidth. I imagine there is a monitization avenue by providing a backend service with a better model / service. I'm not currently planning on setting one up, but there is no reason others can't. Please provide proper credit / attribution, but ultimately resources in this repo and dataset are open and free.

I'm looking for volunteers (with credit and authorship on a peer-reviewed paper) to help build the app, or provide resources so I can build the app. I want the app to be open source and free to use.

I'm interested in contributing to the academic community. That being said, what's your timeline like?

It would likely be programmed in React-Native and in past experience, from the inception to the release may take some time. As well, having the model perform inference on the device, I'm not 100% familiar with. That being said, on a server, I could probably have this up in no-time.

So questions:

What's the paper about?
What's the timeline?

That's awsome @njho! I'd be thrilled to work together. Have you published before, i.e. have an ORCID?

I consider myself an exceptionally strong Python software engineer, but I know next to nothing about web or phone application programming. I do have full-stack experience, I've written Qt apps, and flask server-side code, but I shine as a backend programmer. I'm trying to learn JS and Vue, but I don't have a strong grasp on it yet.

React-Native looks appealing. I'm just reading about it now, but I like that it is device agnostic. If I understand correctly its basically writing a web app.

Looks like it has the ability to get camera information: https://github.com/mrousavy/react-native-vision-camera

I suppose as a proof of concept we can assume some server exists that it willing to return predictions based on an image payload. That is one of the compute-modalities I'm interested in supporting. Are there standards you know of for serving model input / predictions?

I don't want to loose sight of in-situ processing. I care a lot about the case where no internet is available from an inclusiveness and forward-looking perspective; internet access is not universal or guaranteed. It looks like PlayTorch might be another way to go about putting a torch model on a phone: https://playtorch.dev/

Do you want to setup a prototype in this repo? Or do you want to make a separate repo for it? A prototype with instructions on how to run would be amazing. If I can get to the point where I can edit the code and see changes happening, then I'll start picking things up and be able to help more.

I do strongly prefer Python as a glue language, but I don't know how much support for it there is with phone apps. Can you describe how you envision the app architecture?

What's the paper about

My thought is to target an applications focused conference like WACV. The contributions of the paper would be:

The dataset itself. I think the challenges posed by this problem are distinct enough from other existing benchmark datasets (there might be some Augmented Reality literature I have to read up on to ensure this is the case).
A quantiative comparison of different off the shelf detection methods using consumer level hardware (not particularly novel, but validating benchmark results on a new dataset is valuable to further meta-analysis).
The application if that converges in time, although I think its not necessary depending on the venue. For WACV it would help a lot to have a prototype.
I would also like to include discussion of distributed scientific datasets with this particular use case using IPFS. I think there are experiments that can be done to compare how fast it is to gain access to the dataset via a distributed versus centralized mechanism, the cost of hosting the data, etc.

The above is not a hard-requirement, but I've been thinking this would make a good paper for awhile, and the above are a few reasons why. The README is effectively my first draft at the paper. I've also been thinking about how to combine this with the TACO dataset, as I think the domains (stuff on the ground) overlap. There is a lot of unannotated trash in my images.

There are deeper experients I want to do on continual learning, which this dataset might be useful for, but I'm focused on that for my real job, so I don't expect to work that angle here, although if we did we could probably pass review for a top-tier conference like CVPR.

In terms of things that could bump this hypothetical paper up a tier could be a meta-discussion of scientific reproducibility and accessability. I've been very interseted in web3 lately. Things like content addressed data and the ability to programatically prove that you attested to a statement with programatic consequences can be a boon to scientific rigor. Observations about how science can be done in a distributed and public fashion like this - where we are strangers on the internet collaberating to accomplish a shared goal - be of great academic value. However, a paper like that is a tad ambitious, and a simple WACV paper about training and deploying an open source poop detecting app for phones is an achievable medium-term goal that would support this longer-term goal.

What's the timeline

WACV 2025 has its paper deadline in August, but I'm in no hurry to publish. I'm doing all this work on my own time, so I can only devote so many hours a week to this. I want to write a quality paper first, and then figure out where to publish it rather than scrambling to meet a deadline. As such the first place it lands will likely be arxiv and nodes.desci. (The latter of which may have something resembling a peer review mechanism soon, which I'm very excited about).

Nope, I don't have an ORCID. Would I need to get one?

React-Native looks appealing. I'm just reading about it now, but I like that it is device agnostic. If I understand correctly its basically writing a web app.

Correct! It struggles in that there are libraries which are community supported which can get fragmented, and building for all edge cases can sometimes be difficult - in particular for Android - iOS tends to be more consistent in layout etc. The other option for crass-platform is Flutter which I haven't used, but have heard great things about. I've built about 3 apps that are no longer maintained on RN.

I suppose as a proof of concept we can assume some server exists that it willing to return predictions based on an image payload. That is one of the compute-modalities I'm interested in supporting. Are there standards you know of for serving model input / predictions?

Yeah, so I actually finished training and confirming that training worked and it came out to ~60 mb and can run on an Raspberry Pi. Depending on the model size, there's a couple options. I think for this one it could be hosted on Cloud Run quite easily and it wouldn't have any issue. I've done this before, but there are costs associated depending on your throughput.

For larger applications, let's say someting that takes longer than 5-10 seconds to run inference, or large models, hosting on Replicate, Modal, or BentoML or rolling your own hardware.

I don't want to loose sight of in-situ processing.

Thinking about it more and getting it to run on my RPi is giving me more confidence about this. I think we shoudln't have any issue. Once again, though I don't know if there will be any issues associated with bundling a .onnx or something into a Google Play/App Store environment, but it seems like there are packages offering ONNX runtime support rn-onnx. Obviously smaller models probably better. Maybe the user would have to download the model to a tempdir of sorts after installation. I'm not sure...

WACV 2025 has its paper deadline in August ... I can only devote so much time

That's fine. Let's say you did want this published in August, it wouldn't be any trouble to have a prototype built out. It should be relatively fast.

Structure

Authentication Required? Easier if not, but endpoints can't be protected then - though I don't think this should be an issue if it's public... as I don't see pople having much reason to game, or be a bad actor. We can monitor and add this too afterwards if it's an issue. If in-situ this doesn't even matter
App opens to Camera Application
Takes photo and runs annotation?

I think Prototype would be best in a different repo. I tend to prefer rapid firing commits and stuff. I think when it's overall completed then we could import it in here!

Would I need to get one?

Not necessarily. It helps disambiguate researchers, and it would help if you wanted authorship on the desci version of the paper.

One React-Native vs Flutter

Then, lets go with what you know to get a proof of concept. I'd like to get an app online and available for download, even if it's not perfect. It will always be possible to upgrade.

On Cloud Run

The free tier should work for a proof of concept, but it won't scale.

Raspberry Pi / onnx / app-stores

If it works on a pi, I think a lightweight model should run fine on a phone. We definitely don't want to ship the model in the app itself. I think we host the models on IPFS, and then give the app a mapping from "user-friendly names" to CIDs. When we want to release a new model, we can update the IPFS endpoints. This should scale too, because if the data is small and we use a gateway, the gateway will start caching the model if enough people try to grab it, which will make it must faster to access.

Does IOS/Android have a concept of XDG directories? Or perhaps each app can request permissions for some cache storage?

Publishing in August.

I'll make an effort to writeup an initial draft soon in the next month or so. I think I have a pretty good idea of intro / related work / methods, but I'll need to think about a plan for the experiments. I've got 2x 3090's and 2x 1080ti's, so maybe we come at it from a "training on consumer hardware" angle.

Authentication Required?

Ideally no. I can't think of anything sensitive. We should start with a permissionless design.

Prototype repo

Sounds good. Start one on your account, and I'll add it as a submodule in this repo. That will give both of us control over the state, and it will make it easy for me to hack on it as well.

I tend to prefer rapid firing commits and stuff.

I'm something of a alias gcwip='git commit -am "wip" && git push', myself. You may be interested in the git-squash-streaks command installed by my git-well package, which "Squashes consecutive commits that meet a specified criteiron".

@njho Does this have any updates, or has its stalled?

Hey we've stalled because I never got started!

I've sent an email to the email on your GH account with my personal email and contact. From there we can coordinate, let's start a repo and get going. I'll be able to discuss what I can commit, through private :)

I'm something of a alias gcwip='git commit -am "wip" && git push', myself. You may be interested in the git-squash-streaks command installed by my [git-well](https://github.com/Erotemic/git_well) package, which "Squashes consecutive commits that meet a specified criteiron".

Wow this is awesome ^^

Erotemic / shitspotter

Help Wanted: Creating Phone App #20