Implement single-frame view and processing on client without blowing up client memory

simsong commented 1 month ago

Instead of #412, we could decode the frames on the client directly. (We could even go back to tracking on the client if we get OpenCV in web assembly).

Please review this: https://chatgpt.com/share/e1e43fbe-1e69-4226-a3ad-fb5f0d3334ec

Let's discuss.

JoAnnJuzefyk commented 1 month ago

I had a quick read of the possibilities from chatgpt. I don't know much about web assembly or how to get OpenCV to compile there (seems doable https://chatgpt.com/share/536a6695-4698-482a-8a39-3d82a804b6a6). I will be able to spend time on this next week. I look forward to being on the project again (and not in classes).

simsong commented 1 month ago

You don't need to know about webassembly; it's already available for download in webassembly.

sbarber2 commented 1 month ago

Let's not conflate proposals, please. Decoding the frames in the client and tracking on the client are two very, very different things. This Issue is about decoding frames on the client, yes?

simsong commented 1 month ago

Let's not conflate proposals, please. Decoding the frames in the client and tracking on the client are two very, very different things. This Issue is about decoding frames on the client, yes?

They are related:

We can't track on the client unless we can extract frames on the client.
We can't extract frames on the client and track on the server unless both frame extraction systems produce the same number of extracted frames.
We can play without extracting frames if play-by-timestamp produces exactly the same frames as play-by-framenumber.
It's unclear if LapseIt puts timecode into the frames.
It's unclear if play-by-timestamp requires timecode.
It's unclear if we can change the fps.

My plan at the moment:

[ ] Implement all of the JavaScript player examples recommended by GPT-4.
[ ] Implement frame extraction on the client produces the same frame count.
[ ] Implement playing on the client from an MPEG, if possible, rather than creating a zip file with all of the frames.
[ ] Play with moving the tracking into happening on the client.

You may think that there is a huge amount of wasted work if we do this, but in fact there isn't. There is a lot of code that will be the same, and we have a much better app than we started with.

sbarber2 commented 1 month ago

To refresh the motivation for this discussion so we all have a common recollection of the context necessary to make decisions, and please correct if I've got any of this wrong: 1) the reason there is a custom video viewer for this app was because the video viewers built-in to browsers and reasonably available via open source JavaScript don't give frame by frame movie navigation controls, and PlantTracer needs that precise control in easy-on-the-user ways. 2) the primary reason we're considering client side frame decoding is because frames are arriving out of order from the server side? (If that's the motivation, then why not just encode frame numbers in the passed server->client data and buffer on the client side for a certain amount of time, with a time out to skip missing frames and discard late-arriving ones?) 3) Or is the reason to do client side frame decoding to offload that task from the server and maybe optimize network traffic by sending fewer larger sets of frames (that is, a whole movie at a time) rather than lots of smaller requests (one per frame)?

Let's sort through the motivations here so we can make intelligent and future-looking choices.

Even before doing that in writing, some thoughts on the presented alternatives from the ChatGPT thread:

Client-side frame decoding alternatives:

WebCodecsAPI looks pretty nice. Probably good fine-grained control capabilities. Downside: not yet supported at all on Firefox, only partially supported on Safari but since the part that is supported is "video" maybe that's OK. (Remind me again what browsers we have definitely decided to support on which platforms? I feel like this discussion never got nailed down. 😀)
FFmpeg WASM: probably gives fine-grained control, too. Upside: WASM supported on all interesting browsers and platforms today. Downside might be that FFmpeg is a kitchen-sink tool and the WASM might be heavy on the client side? I don't know; measuring would be good.
HTML5 video and canvas: I highly suspect we run off the end of the fine-grained movie navigation control capabilities here pretty fast, even if it works across all of our targeted browsers for today's requirements. Also, without thinking too hard about it, the discussion about tracking from numbers with this option feels like a hack, and thus may only get more and more troublesome over time.

sbarber2 commented 1 month ago

We crossed comments in time there, a bit.

I agree that prototyping the approaches isn't a ton of work.

Of course pretty much everything in the universe is related in some way! From your response, I infer that you are already leaning towards tracking on the client? If that's true, kindly remind us why that is? I don't disagree; I just can't remember. (Though of course, (pedantically? proper project management-wise?), decoding and tracking typically would be separate GitHub issues!)

simsong commented 1 month ago

The custom video viewer provides:

Frame-by-frame control
The ability to annotate any frame (both view annotations and move them)

We can add annotation with any video viewer that plays to an off-screen canvas and lets me have a callback after each frame draw. I then tell the off-screen canvas to draw to the on-screen canvas (this is double-buffering, a bitblt operation).

Advantages of playing the MPEG directly on the client:

Smaller code base, so better maintainability.
Only need to store MPEG for each movie, rather than also storing all of the individual frames. (I might be able to just transcode the movie and download a ZIP file of all the frames, but...)
The in-browser player will handle REALLY BIG VIDEO FILES (e.g. 100MB; 200MB) by windowing them, and I don't want to implement that code.

Advantages of using client-side frame decode but using our current player:

No need to store individual frames on the server.

Advantages of using server-side frame decode:

Code is written and works.
Required for server-side tracking.

Re: WebCodecsAPI support

Looks like the only issue is Firefox. We don't need Firefox. We can easily say it only runs on Chrome and Safari.

Re: FFmpeg WASM:

I love WASM. If you aren't familiar with it, you compile your C/C++ with llvm to the WASM object code. It turns safely in a container in your browser. WASM runs nearly
ffmpeg's wasm performance is not good: https://ffmpegwasm.netlify.app/docs/performance/
OpenCV wasm performance test: https://intel.github.io/webml-polyfill/workload/opencv_threshold/ • https://sedici.unlp.edu.ar/handle/10915/89186

Looks like OpenCV is fast enough. (I can also use openCV to rip out the frames, rather than ffmpeg as recommended by GPT-4. That's what we are doing now)

simsong commented 1 month ago

We crossed comments in time there, a bit.

I agree that prototyping the approaches isn't a ton of work.

Of course pretty much everything in the universe is related in some way! From your response, I infer that you are already leaning towards tracking on the client? If that's true, kindly remind us why that is? I don't disagree; I just can't remember. (Though of course, (pedantically? proper project management-wise?), decoding and tracking typically would be separate GitHub issues!)

I'm uncommitted about tracking on the server vs. tracking on the client. However, moving the tracking code to the client will generate better performance for the end-user.

We didn't go with tracking on the client originally because we didn't understand the JavaScript code and I'm a better Python programmer. Also we still don't have a good way of unit testing on the client.

sbarber2 commented 1 month ago

Just for my own edification and possibly others' as well: for WebCodecs -- what it means to have Video Only support on Safari is that it doesn't support audio, which is a non-issue for PlantTracer (so far! 😀).

simsong commented 1 month ago

Just for my own edification and possibly others' as well: for WebCodecs -- what it means to have Video Only support on Safari is that it doesn't support audio, which is a non-issue for PlantTracer (so far! 😀).

Correct.

sbarber2 commented 1 month ago

However, moving the tracking code to the client will generate better performance for the end-user.

Hmm, yes, the performance of tracking via execution on AWS Lambda left a bit to be desired.

simsong commented 1 month ago

However, moving the tracking code to the client will generate better performance for the end-user.

Hmm, yes, the performance of tracking via execution on AWS Lambda left a bit to be desired.

Actually, it's zippy. The thing that's slow is telling the UI that it's done.

sbarber2 commented 1 month ago

@simsong You asked me elsewhere for a recommendation so given all the above, so long as we are happy not supporting Firefox in the near term (a year or two? Maybe less??, then my instinct is that WebCodecs sounds the most promising.

Though of course if you do prototype all three, you'll no longer want my recommendation, you'll already know!

sbarber2 commented 1 month ago

Actually, it's zippy. The thing that's slow is telling the UI that it's done.

Zippy schmippy. From a user perspective (mine!), the perception is: SLOW. The particular locus of the delay within the processing sequence is uninteresting from the point of view of the user experience.

The performance risk in moving tracking to the client is probably more that some client machines/devices will be under-powered for the task.

simsong commented 1 month ago

I think that I could set the poll time to 0.1 seconds and it would seem fast.

simsong commented 1 month ago

option #1 - the frame count using the videoprocessor is the same that I get with ffmpeg, and the annotation program properly annotates every frame. This is very sweet. See 1b0f4a1265d49c72fa23a1915e3745f12da0cfa0

         async function decodeVideo(file) {
             console.log("decodeVideo file=",file);
             const video = document.createElement('video');
             const canvas = document.getElementById('canvas');
             const ctx = canvas.getContext('2d');
             video.src = URL.createObjectURL(file);
             await video.play();
             const videoTrack = video.captureStream().getVideoTracks()[0];
             const videoProcessor = new MediaStreamTrackProcessor(videoTrack);
             const reader = videoProcessor.readable.getReader();

             var count=0;
             while (true) {
                 const { done, value } = await reader.read();
                 console.log("done=",done,"value=",value,"count=",count);
                 if (done) break;
                 const bitmap = await createImageBitmap(value);
                 ctx.drawImage(bitmap, 0, 0);
                 ctx.font = '24px sanserif';
                 ctx.fillStyle = 'yellow';
                 ctx.fillText( `frame ${count}`, 25, 200);
                 count++;
                 // Annotate frame here
             }
         }

         document.getElementById('file-input').addEventListener('change', (event) => {
             const file = event.target.files[0];
             decodeVideo(file);
         });

simsong commented 1 month ago

Option #2 with ffmpeg.js doesn't work:

        <script src="https://unpkg.com/@ffmpeg/ffmpeg@0.8.3/dist/ffmpeg.min.js"></script>

simsong commented 1 month ago

Hm. ... From chatGPT: "The error ReferenceError: SharedArrayBuffer is not defined occurs because SharedArrayBuffer is not available in all browsing contexts due to security restrictions related to Spectre and Meltdown vulnerabilities. To use SharedArrayBuffer, you need to enable cross-origin isolation by setting appropriate headers. Here’s how you can do it:"

simsong commented 1 month ago

This is a mess. I looked at https://unpkg.com/@ffmpeg/ffmpeg@0.8.3/dist/ffmpeg.min.js and it's not code that I want to use.

simsong commented 1 month ago

Option #3 did not get the correct number of rames. Code:

    <body>
        <input type="file" id="file-input" />
        <p/>
        <video id="video" controls></video>
        <canvas id="canvas"></canvas>
    <script>
         const video = document.getElementById('video');
         const canvas = document.getElementById('canvas');
         const ctx = canvas.getContext('2d');

         document.getElementById('file-input').addEventListener('change', (event) => {
             const file = event.target.files[0];
             video.src = URL.createObjectURL(file);
         });

         var count=0;
         video.addEventListener('play', () => {
             const drawFrame = () => {
                 if (!video.paused && !video.ended) {
                     ctx.drawImage(video, 0, 0);
                 // Annotate frame here
                 console.log(`annotate ${count}`);
                     count++;
                 requestAnimationFrame(drawFrame);
                 }
             };
             drawFrame();
         });
        </script>
    </body>

Run:

simsong commented 1 month ago

Looks like it is option #1. It's easy to move forward. to move backward, we just keep all of the frames in memory. What do you think, @sbarber2 ?

sbarber2 commented 1 month ago

I think you could set it to 0.2 seconds and it would still seem fast.

On Wed, May 22, 2024 at 7:22 PM Simson L. Garfinkel < @.***> wrote:

I think that I could set the poll time to 0.1 seconds and it would seem fast.

— Reply to this email directly, view it on GitHub https://github.com/Plant-Tracer/webapp/issues/414#issuecomment-2125934859, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAFQOBZQB3PEPTSC2ELYKG3ZDUSDNAVCNFSM6AAAAABIDP7VSGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMRVHEZTIOBVHE . You are receiving this because you were assigned.Message ID: @.***>

sbarber2 commented 1 month ago

Yes. Agreed. #1.

We have to disavow Firefox support, though, until Firefox catches up to this API. (Which seems to be in the works, so maybe only for a year? Just guessing.)

I was thinking what about RAM requirements on phones and tablets, but then I looked it up and we should be good with, say, iPhones >= 11, iPads >= 2015, which should be fine these days. That is, 4GB+.

On Tue, May 28, 2024 at 9:47 PM Simson L. Garfinkel < @.***> wrote:

Looks like it is option #1 https://github.com/Plant-Tracer/webapp/pull/1. It's easy to move forward. to move backward, we just keep all of the frames in memory. What do you think, @sbarber2 https://github.com/sbarber2 ?

— Reply to this email directly, view it on GitHub https://github.com/Plant-Tracer/webapp/issues/414#issuecomment-2136368416, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAFQOB5JUO7DDJLWEN3ZOUDZEUXR7AVCNFSM6AAAAABIDP7VSGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMZWGM3DQNBRGY . You are receiving this because you were mentioned.Message ID: @.***>

simsong commented 1 month ago

So after several go-arounds with ChatGPT and reading the docs, I've learned:

Unpacking every frame and storing it on the client is going to cause problems, because each frame will be stored uncompressed. So a 50MB movie may blow up to 5GB.
The code that ChatGPT gave us was actually playing the movie in an off-screen window and grabbing frames. We just got lucky that each frame was grabbed as it was played. But that wasn't guarenteed.

However, after an hour with ChatGPT, I learned that there is an API called ... requestVideoFrameCallback().

That's right — you can get a callback every time each video frame is loaded. Here are the docs and a demo program:

Next for me:

So I took the demo at https://requestvideoframecallback.glitch.me/ and made a copy at https://simson.net/rvfc/ and added a video.pause() as the first line of the callback. And it works! I'm single-framing through the movie. So now I can go forward. Going backwards is easy - just go to the beginning and single-frame forward. (for now.)

sbarber2 commented 1 month ago

Oh, yeah, hmm, 5GB is not going to cut it as video buffer space in the client machine.

Also, in conversation with Simson yesterday, it sounds like for now we are going to declare mobile devices (iOS, Android) out of scope for the target platforms of the current webapp releases. This is consistent with the primary user persona targeting of "Bio 101" undergraduates (that is, they have do have a laptop somewhere), limiting our UX design to exclude phone-sized screens, limiting our user/browser testing space dramatically, and also we can then assume that a webapp client machine will have 8GB RAM minimum overall.

simsong commented 4 weeks ago

I am not able to reliably get the second frame of the video. My playback system, which is now reliable and caching, frequently misses frame #1. This may be because of random times introduced to defeat browser fingerprinting.

I have spent hours building a reliable player that can single-frame forward and backward using the callback, but the callback API specifically says that it can lose frames. And to make things worse, you're supposed to look at time differences and if they are bigger than 12-16 msec, you are supposed to be able to infer specific kinds of behavior. This is silly.

Here is the ffmpeg command to split a movie into jpegs:

ffmpeg -i movie1.mov -qscale:v 2 frame_%04d.jpg

This movie:

-rw-r--r--@ 1 simsong  staff  1667716 Apr 21 09:05 movie1.mov

ZIPs to:

-rw-r--r--@ 1 simsong  staff  1941951 Jun  5 22:12 frames.zip

Which isn't that much bigger, so I'm going to go back to the old approach of just downloading a zip file of all the frames. Ugh. This has been a huge waste of time.

frames.zip

https://github.com/Plant-Tracer/webapp/assets/1594284/27583556-19f2-41c8-933e-22c082ff14ef

simsong commented 3 weeks ago

No, the ZIP file is much bigger:

(env) simsong@Simsons-MacBook-Pro demo % ls -l tracked*                                         (dev-movieapi)webapp
-rw-r--r--@ 1 simsong  staff   228919 May 29 18:54 tracked.mp4
-rw-r--r--@ 1 simsong  staff  3757799 Jun  7 07:36 tracked.zip

simsong commented 2 weeks ago

Well, this is promising:

Plant-Tracer / webapp

Implement single-frame view and processing on client without blowing up client memory #414