Closed simsong closed 2 weeks ago
I had a quick read of the possibilities from chatgpt. I don't know much about web assembly or how to get OpenCV to compile there (seems doable https://chatgpt.com/share/536a6695-4698-482a-8a39-3d82a804b6a6). I will be able to spend time on this next week. I look forward to being on the project again (and not in classes).
You don't need to know about webassembly; it's already available for download in webassembly.
Let's not conflate proposals, please. Decoding the frames in the client and tracking on the client are two very, very different things. This Issue is about decoding frames on the client, yes?
Let's not conflate proposals, please. Decoding the frames in the client and tracking on the client are two very, very different things. This Issue is about decoding frames on the client, yes?
They are related:
My plan at the moment:
You may think that there is a huge amount of wasted work if we do this, but in fact there isn't. There is a lot of code that will be the same, and we have a much better app than we started with.
To refresh the motivation for this discussion so we all have a common recollection of the context necessary to make decisions, and please correct if I've got any of this wrong: 1) the reason there is a custom video viewer for this app was because the video viewers built-in to browsers and reasonably available via open source JavaScript don't give frame by frame movie navigation controls, and PlantTracer needs that precise control in easy-on-the-user ways. 2) the primary reason we're considering client side frame decoding is because frames are arriving out of order from the server side? (If that's the motivation, then why not just encode frame numbers in the passed server->client data and buffer on the client side for a certain amount of time, with a time out to skip missing frames and discard late-arriving ones?) 3) Or is the reason to do client side frame decoding to offload that task from the server and maybe optimize network traffic by sending fewer larger sets of frames (that is, a whole movie at a time) rather than lots of smaller requests (one per frame)?
Let's sort through the motivations here so we can make intelligent and future-looking choices.
Even before doing that in writing, some thoughts on the presented alternatives from the ChatGPT thread:
Client-side frame decoding alternatives:
We crossed comments in time there, a bit.
I agree that prototyping the approaches isn't a ton of work.
Of course pretty much everything in the universe is related in some way! From your response, I infer that you are already leaning towards tracking on the client? If that's true, kindly remind us why that is? I don't disagree; I just can't remember. (Though of course, (pedantically? proper project management-wise?), decoding and tracking typically would be separate GitHub issues!)
The custom video viewer provides:
We can add annotation with any video viewer that plays to an off-screen canvas and lets me have a callback after each frame draw. I then tell the off-screen canvas to draw to the on-screen canvas (this is double-buffering, a bitblt operation).
Advantages of playing the MPEG directly on the client:
Advantages of using client-side frame decode but using our current player:
Advantages of using server-side frame decode:
Re: WebCodecsAPI support
Re: FFmpeg WASM:
Looks like OpenCV is fast enough. (I can also use openCV to rip out the frames, rather than ffmpeg as recommended by GPT-4. That's what we are doing now)
We crossed comments in time there, a bit.
I agree that prototyping the approaches isn't a ton of work.
Of course pretty much everything in the universe is related in some way! From your response, I infer that you are already leaning towards tracking on the client? If that's true, kindly remind us why that is? I don't disagree; I just can't remember. (Though of course, (pedantically? proper project management-wise?), decoding and tracking typically would be separate GitHub issues!)
I'm uncommitted about tracking on the server vs. tracking on the client. However, moving the tracking code to the client will generate better performance for the end-user.
We didn't go with tracking on the client originally because we didn't understand the JavaScript code and I'm a better Python programmer. Also we still don't have a good way of unit testing on the client.
Just for my own edification and possibly others' as well: for WebCodecs -- what it means to have Video Only support on Safari is that it doesn't support audio, which is a non-issue for PlantTracer (so far! 😀).
Just for my own edification and possibly others' as well: for WebCodecs -- what it means to have Video Only support on Safari is that it doesn't support audio, which is a non-issue for PlantTracer (so far! 😀).
Correct.
However, moving the tracking code to the client will generate better performance for the end-user.
Hmm, yes, the performance of tracking via execution on AWS Lambda left a bit to be desired.
However, moving the tracking code to the client will generate better performance for the end-user.
Hmm, yes, the performance of tracking via execution on AWS Lambda left a bit to be desired.
Actually, it's zippy. The thing that's slow is telling the UI that it's done.
@simsong You asked me elsewhere for a recommendation so given all the above, so long as we are happy not supporting Firefox in the near term (a year or two? Maybe less??, then my instinct is that WebCodecs sounds the most promising.
Though of course if you do prototype all three, you'll no longer want my recommendation, you'll already know!
Actually, it's zippy. The thing that's slow is telling the UI that it's done.
Zippy schmippy. From a user perspective (mine!), the perception is: SLOW. The particular locus of the delay within the processing sequence is uninteresting from the point of view of the user experience.
The performance risk in moving tracking to the client is probably more that some client machines/devices will be under-powered for the task.
I think that I could set the poll time to 0.1 seconds and it would seem fast.
option #1 - the frame count using the videoprocessor is the same that I get with ffmpeg, and the annotation program properly annotates every frame. This is very sweet. See 1b0f4a1265d49c72fa23a1915e3745f12da0cfa0
async function decodeVideo(file) {
console.log("decodeVideo file=",file);
const video = document.createElement('video');
const canvas = document.getElementById('canvas');
const ctx = canvas.getContext('2d');
video.src = URL.createObjectURL(file);
await video.play();
const videoTrack = video.captureStream().getVideoTracks()[0];
const videoProcessor = new MediaStreamTrackProcessor(videoTrack);
const reader = videoProcessor.readable.getReader();
var count=0;
while (true) {
const { done, value } = await reader.read();
console.log("done=",done,"value=",value,"count=",count);
if (done) break;
const bitmap = await createImageBitmap(value);
ctx.drawImage(bitmap, 0, 0);
ctx.font = '24px sanserif';
ctx.fillStyle = 'yellow';
ctx.fillText( `frame ${count}`, 25, 200);
count++;
// Annotate frame here
}
}
document.getElementById('file-input').addEventListener('change', (event) => {
const file = event.target.files[0];
decodeVideo(file);
});
Option #2 with ffmpeg.js doesn't work:
<script src="https://unpkg.com/@ffmpeg/ffmpeg@0.8.3/dist/ffmpeg.min.js"></script>
Hm. ... From chatGPT: "The error ReferenceError: SharedArrayBuffer is not defined occurs because SharedArrayBuffer is not available in all browsing contexts due to security restrictions related to Spectre and Meltdown vulnerabilities. To use SharedArrayBuffer, you need to enable cross-origin isolation by setting appropriate headers. Here’s how you can do it:"
This is a mess. I looked at https://unpkg.com/@ffmpeg/ffmpeg@0.8.3/dist/ffmpeg.min.js and it's not code that I want to use.
Option #3 did not get the correct number of rames. Code:
<body>
<input type="file" id="file-input" />
<p/>
<video id="video" controls></video>
<canvas id="canvas"></canvas>
<script>
const video = document.getElementById('video');
const canvas = document.getElementById('canvas');
const ctx = canvas.getContext('2d');
document.getElementById('file-input').addEventListener('change', (event) => {
const file = event.target.files[0];
video.src = URL.createObjectURL(file);
});
var count=0;
video.addEventListener('play', () => {
const drawFrame = () => {
if (!video.paused && !video.ended) {
ctx.drawImage(video, 0, 0);
// Annotate frame here
console.log(`annotate ${count}`);
count++;
requestAnimationFrame(drawFrame);
}
};
drawFrame();
});
</script>
</body>
Run:
Looks like it is option #1. It's easy to move forward. to move backward, we just keep all of the frames in memory. What do you think, @sbarber2 ?
I think you could set it to 0.2 seconds and it would still seem fast.
On Wed, May 22, 2024 at 7:22 PM Simson L. Garfinkel < @.***> wrote:
I think that I could set the poll time to 0.1 seconds and it would seem fast.
— Reply to this email directly, view it on GitHub https://github.com/Plant-Tracer/webapp/issues/414#issuecomment-2125934859, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAFQOBZQB3PEPTSC2ELYKG3ZDUSDNAVCNFSM6AAAAABIDP7VSGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMRVHEZTIOBVHE . You are receiving this because you were assigned.Message ID: @.***>
Yes. Agreed. #1.
We have to disavow Firefox support, though, until Firefox catches up to this API. (Which seems to be in the works, so maybe only for a year? Just guessing.)
I was thinking what about RAM requirements on phones and tablets, but then I looked it up and we should be good with, say, iPhones >= 11, iPads >= 2015, which should be fine these days. That is, 4GB+.
On Tue, May 28, 2024 at 9:47 PM Simson L. Garfinkel < @.***> wrote:
Looks like it is option #1 https://github.com/Plant-Tracer/webapp/pull/1. It's easy to move forward. to move backward, we just keep all of the frames in memory. What do you think, @sbarber2 https://github.com/sbarber2 ?
— Reply to this email directly, view it on GitHub https://github.com/Plant-Tracer/webapp/issues/414#issuecomment-2136368416, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAFQOB5JUO7DDJLWEN3ZOUDZEUXR7AVCNFSM6AAAAABIDP7VSGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMZWGM3DQNBRGY . You are receiving this because you were mentioned.Message ID: @.***>
So after several go-arounds with ChatGPT and reading the docs, I've learned:
However, after an hour with ChatGPT, I learned that there is an API called ... requestVideoFrameCallback()
.
That's right — you can get a callback every time each video frame is loaded. Here are the docs and a demo program:
Next for me:
So I took the demo at https://requestvideoframecallback.glitch.me/ and made a copy at https://simson.net/rvfc/ and added a video.pause()
as the first line of the callback. And it works! I'm single-framing through the movie. So now I can go forward. Going backwards is easy - just go to the beginning and single-frame forward. (for now.)
Oh, yeah, hmm, 5GB is not going to cut it as video buffer space in the client machine.
Also, in conversation with Simson yesterday, it sounds like for now we are going to declare mobile devices (iOS, Android) out of scope for the target platforms of the current webapp releases. This is consistent with the primary user persona targeting of "Bio 101" undergraduates (that is, they have do have a laptop somewhere), limiting our UX design to exclude phone-sized screens, limiting our user/browser testing space dramatically, and also we can then assume that a webapp client machine will have 8GB RAM minimum overall.
I am not able to reliably get the second frame of the video. My playback system, which is now reliable and caching, frequently misses frame #1. This may be because of random times introduced to defeat browser fingerprinting.
I have spent hours building a reliable player that can single-frame forward and backward using the callback, but the callback API specifically says that it can lose frames. And to make things worse, you're supposed to look at time differences and if they are bigger than 12-16 msec, you are supposed to be able to infer specific kinds of behavior. This is silly.
Here is the ffmpeg command to split a movie into jpegs:
ffmpeg -i movie1.mov -qscale:v 2 frame_%04d.jpg
This movie:
-rw-r--r--@ 1 simsong staff 1667716 Apr 21 09:05 movie1.mov
ZIPs to:
-rw-r--r--@ 1 simsong staff 1941951 Jun 5 22:12 frames.zip
Which isn't that much bigger, so I'm going to go back to the old approach of just downloading a zip file of all the frames. Ugh. This has been a huge waste of time.
https://github.com/Plant-Tracer/webapp/assets/1594284/27583556-19f2-41c8-933e-22c082ff14ef
No, the ZIP file is much bigger:
(env) simsong@Simsons-MacBook-Pro demo % ls -l tracked* (dev-movieapi)webapp
-rw-r--r--@ 1 simsong staff 228919 May 29 18:54 tracked.mp4
-rw-r--r--@ 1 simsong staff 3757799 Jun 7 07:36 tracked.zip
Well, this is promising:
Instead of #412, we could decode the frames on the client directly. (We could even go back to tracking on the client if we get OpenCV in web assembly).
Please review this: https://chatgpt.com/share/e1e43fbe-1e69-4226-a3ad-fb5f0d3334ec
Let's discuss.