facebookresearch / labgraph

LabGraph is a Python framework for rapidly prototyping experimental systems for real-time streaming applications. It is particularly well-suited to real-time neuroscience, physiology and psychology experiments.
MIT License
162 stars 47 forks source link

Pose visualization #95

Closed Dasfaust closed 1 year ago

Dasfaust commented 2 years ago

Description

PoseVis

PoseVis is a LabGraph extension that streams any number of video sources and generates pose landmark data from MediaPipe for each stream independently. MediaPipe Hands, Face Mesh, Pose, and Holistic solutions are supported. PoseVis supports data logging and replaying via the HDF5 format. See Using PoseVis for details.

Usage preview

PoseVis can also support other image processing tasks through its extension system. Take a look at the hands extension for an example.

Installation

PoseVis uses OpenCV to handle video streams. Out of the box, PoseVis streams camera, video file, and image directory sources from the MSMF backend in Windows, and V4L2 backend in Linux, with the MJPEG format (see OpenCV backends here). This configuration should be supported by most UVC devices. Further source stream customization can be achieved by installing GStreamer; steps are detailed below.

PoseVis General Setup

Requires Python 3.8 or later. Run setup.py to install required packages from PyPi:

python setup.py install

See Using PoseVis for usage details.

GStreamer Support (Optional)

GStreamer is a multimedia framework that allows you to create your own media pipelines with a simple string input. If you need more flexibility than a simple MJPEG stream, you can install GStreamer using the steps below.

Example GStreamer Configurations

PoseVis expects color formats from GStreamer to be in the BGR color space, and OpenCV requires the use of appsink.

Creating a test source: this configuration creates the videotestsrc element and configures a 720p @ 30Hz stream in BGR.

python -m pose_vis.pose_vis --sources "videotestsrc ! video/x-raw, width=1280, height=720, framerate=30/1, format=BGR ! appsink"

Creating a device source in Linux: this configuration captures an MJPEG stream at 720p @ 30Hz from a V4L2 device and converts the image format into raw BGR.

python -m pose_vis.pose_vis --sources "v4l2src device=/dev/video0 ! image/jpeg, width=1280, height=720, framerate=30/1 ! jpegparse ! jpegdec ! videoconvert ! video/x-raw, format=BGR ! appsink"

You can also specify per-camera configurations:

... --sources "v4l2src device=/dev/video0 extra-controls='c, exposure_auto=1' ...

Windows GStreamer Support

Follow the Windows GStreamer guide.

Linux GStreamer Support

Follow the Linux GStreamer guide.

Performance

Performance is crucial for real time applications. Check the benchmark notebook example for performance metrics, including details of the system used for benchmarking. You can also run the notebook on your system to get an idea of how PoseVis will perform.

Using PoseVis

Test PoseVis via Command Line

Check usage details:

python -m pose_vis.pose_vis --help

Using PoseVis in Your Project

Check the usage guide for an in-depth overview of the concepts used in PoseVis and how to hook into its LabGraph topics.

PoseVis Usage Examples

GestureVis

GestureVis uses data from the MediaPipe hand and body pose extensions to guess the current gesture based on a list of known gestures and draws the appropriate annotations onto the video stream, both online and offline. Check out the hands version here and body pose version here.

Logging Example

The logging example notebook shows a simple way to use HDF5 logging with PoseVis.

Type of change

Please delete options that are not relevant.

Feature/Issue validation/testing

Please describe the tests [UT/IT] that you ran to verify your changes and relevant result summary. Provide instructions so it can be reproduced. Please also list any relevant details for your test configuration.

python -m unittest pose_vis/test/tests.py

Runs the graph on a sample image and checks each extension's output. Example result:

INFO:pose_vis.runner: building graph
INFO:pose_vis.runner: enabling extension: HandsExtension
INFO:pose_vis.runner: logging directory is C:\Users\das\Desktop\labgraph\devices\webcam\logs
INFO:pose_vis.runner: running graph
WARNING:labgraph.graphs.graph:PoseVis has unused topics:
        - TERM_HANDLER/INPUT_EXIT_USER has no publishers
This could mean that there are publishers and/or subscribers of Cthulhu streams that Labgraph doesn't know about, and/or that data in some topics is being discarded.
INFO:pose_vis.streams.utils.capture_worker: opening directory: C:\Users\das\Desktop\labgraph\devices\webcam\images
INFO:pose_vis.streams.utils.capture_worker: found 1 image(s)
INFO:pose_vis.streams.utils.capture_worker: worker 0: setting up extension HandsExtension
INFO:pose_vis.streams.utils.capture_worker: worker 0: started
INFO: Created TensorFlow Lite XNNPACK delegate for CPU.
INFO:pose_vis.streams.source_stream: all captures have finished
INFO:pose_vis.test.tests: HandsExtension reported data is OK
INFO:labgraph.runners.local_runner:TerminationHandler (xed46PLBM4ZNaHCw):shutting down normally
INFO:pose_vis.streams.utils.capture_worker: worker 0: shutting down
.
----------------------------------------------------------------------
Ran 1 test in 6.074s

OK

Checklist:

facebook-github-bot commented 2 years ago

Hi @Dasfaust!

Thank you for your pull request and welcome to our community.

Action Required

In order to merge any pull request (code, docs, etc.), we require contributors to sign our Contributor License Agreement, and we don't seem to have one on file for you.

Process

In order for us to review and merge your suggested changes, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA.

Once the CLA is signed, our tooling will perform checks and validations. Afterwards, the pull request will be tagged with CLA signed. The tagging process may take up to 1 hour after signing. Please give it that time before contacting us about it.

If you have received this in error or have any questions, please contact us at cla@fb.com. Thanks!

facebook-github-bot commented 2 years ago

Thank you for signing our Contributor License Agreement. We can now accept your code for this (and any) Meta Open Source project. Thanks!

facebook-github-bot commented 2 years ago

Thank you for signing our Contributor License Agreement. We can now accept your code for this (and any) Meta Open Source project. Thanks!

facebook-github-bot commented 2 years ago

Thank you for signing our Contributor License Agreement. We can now accept your code for this (and any) Meta Open Source project. Thanks!

jfResearchEng commented 2 years ago

To clarify whether devices/webcam/test/thumbs_up.png is copy-right free? And can you please add the source of this file?

Dasfaust commented 2 years ago

To clarify whether devices/webcam/test/thumbs_up.png is copy-right free? And you please add the source of this file?

Looks like it's for personal use only. I got it from here. I'll remove it in the next commit.

jfResearchEng commented 2 years ago

If you can include a demo video link or a gif image, that would be useful for users to understand the functionality.

jfResearchEng commented 2 years ago

Please also add github action support, reference: https://github.com/facebookresearch/labgraph/actions/workflows/main.yml

Dasfaust commented 2 years ago

We have committed quite a few changes, notable additions are multiple camera support, multiple MediaPipe solution support, and integration with LabGraph's HDF5 logger. The PR's description has been updated with a better overview, a GIF preview, a Jupyter Notebook example, and other usage details/thoughts.

One issue I've noticed though is replaying large (500mb+) log files using HDF5Reader: the replay process will crash with just [Cthulhu][ERROR]. Tested on Windows with Python 3.9.13 and LabGraph 2.0.0. If we're not doing anything wrong, is this a memory constraint maybe? If that's the case, one solution might be to create an incremental reader instead of pulling the entire log into memory at once.

Dasfaust commented 2 years ago

Please also add github action support, reference: https://github.com/facebookresearch/labgraph/actions/workflows/main.yml

What kind of actions should be added? Should we use the ATI force touch entries as a template?

jfResearchEng commented 2 years ago

For example demo image attached in this PR, how many cameras are used? When supporting 4 video streams, do you know how does time synchronization work across 4 video streams?

jfResearchEng commented 2 years ago

One issue I've noticed though is replaying large (500mb+) log files using HDF5Reader: the replay process will crash with just [Cthulhu][ERROR]. Tested on Windows with Python 3.9.13 and LabGraph 2.0.0. If we're not doing anything wrong, is this a memory constraint maybe? If that's the case, one solution might be to create an incremental reader instead of pulling the entire log into memory at once.

Replay function is relatively new for labgraph, incremental reader may need to use in this case.

jfResearchEng commented 2 years ago

Please also add github action support, reference: https://github.com/facebookresearch/labgraph/actions/workflows/main.yml

What kind of actions should be added? Should we use the ATI force touch entries as a template?

You can refer to LabGraph Monitor as an example, link.

jfResearchEng commented 2 years ago

To control the size of the extension, can you share a link instead rather than upload the h5 file to the repo: devices/webcam/logs/logging_example.h5

Dasfaust commented 2 years ago

For example demo image attached in this PR, how many cameras are used? When supporting 4 video streams, do you know how does time synchronization work across 4 video streams?

Just one camera in the demo.

There is no coupling, each stream is ran independently in its own process. Results are logged/displayed as soon as they become available.

Replay function is relatively new for labgraph, incremental reader may need to use in this case.

Excellent, we'll see what we can do.

You can refer to LabGraph Monitor as an example, link.

Okay, thanks. Will do once tests are re-introduced, working on that now.

To control the size of the extension, can you share a link instead rather than upload the h5 file to the repo: devices/webcam/logs/logging_example.h5

I have moved logging_example.h5 and preview.gif (it was also big) out of the repository.

jfResearchEng commented 2 years ago

Please see my comments as below in bold:

in the demo.

There is no coupling, each stream is ran independentl

For example demo image attached in this PR, how many cameras are used? When supporting 4 video streams, do you know how does time synchronization work across 4 video streams?

Just one camera in the demo.

There is no coupling, each stream is ran independently in its own process. Results are logged/displayed as soon as they become available.

In this case, strictly speaking, multi-camera synchronization is not yet supported, as each stream is ran independently. Please mention this limitation in README.

Dasfaust commented 2 years ago

@jfResearchEng

The annotation source from the meeting has been uploaded here.

jfResearchEng commented 2 years ago

Hi Dasfaust,

Thanks for the update. Do you also happen to have a link to the annotated video based on the video I shared earlier? And have you also shared the code generating the offline annotated video somewhere?

Thanks, jfResearchEng

@jfResearchEng

The annotation source from the meeting has been uploaded here.