Yolov5 multiple objects detection in a video

VojtenkoRN commented 1 year ago

Description

In my quarkus project I'm trying to use DJL with a yolov5x model for multiple objects detection in a video. I passed all ms coco class names to the synset and then wanted to filter a result. But too many objects were detected and all of them had wrong bounding boxes

Expected Behavior

Correct detection with correct bounding boxes

Error image example

result

How to Reproduce?

Example

Steps to reproduce

Just run the app as the README said

What have you tried to solve it?

I've tried to pass single class name to synset (80 times), but I need several classes and it starts to detect all objects as passed class.

Environment Info

Engine.debugEnvironment() result:

----------- System Properties -----------
java.specification.version: 17
sun.jnu.encoding: UTF-8
java.vm.vendor: GraalVM Community
sun.arch.data.model: 64
java.vendor.url: https://www.graalvm.org/
logging.initial-configurator.min-level: 500
java.vm.specification.version: 17
os.name: Linux
sun.java.launcher: SUN_STANDARD
sun.boot.library.path: ~/.jdks/graalvm-ce-17/lib
jdk.debug: release
sun.cpu.endian: little
jboss.log-version: false
java.specification.vendor: Oracle Corporation
java.version.date: 2023-04-18
java.home: ~/.jdks/graalvm-ce-17
file.separator: /
java.vm.compressedOopsMode: Zero based
jdk.internal.vm.ci.enabled: true
line.separator: 

java.vm.specification.vendor: Oracle Corporation
java.specification.name: Java Platform API Specification
sun.management.compiler: HotSpot 64-Bit Tiered Compilers
java.runtime.version: 17.0.7+7-jvmci-22.3-b18
path.separator: :
os.version: 5.15.0-79-generic
java.runtime.name: OpenJDK Runtime Environment
file.encoding: UTF-8
java.vm.name: OpenJDK 64-Bit Server VM
java.vendor.version: GraalVM CE 22.3.2
java.vendor.url.bug: https://github.com/oracle/graal/issues
java.io.tmpdir: /tmp
java.version: 17.0.7
java.util.concurrent.ForkJoinPool.common.threadFactory: io.quarkus.bootstrap.forkjoin.QuarkusForkJoinWorkerThreadFactory
user.dir: ~/some-path/video-detection-example
os.arch: amd64
java.vm.specification.name: Java Virtual Machine Specification
native.encoding: UTF-8
java.util.logging.manager: org.jboss.logmanager.LogManager
java.library.path: /usr/java/packages/lib:/usr/lib64:/lib64:/lib:/usr/lib
java.vm.info: mixed mode, sharing
java.vendor: GraalVM Community
java.vm.version: 17.0.7+7-jvmci-22.3-b18
sun.io.unicode.encoding: UnicodeLittle
java.class.version: 61.0

--------- Environment Variables ---------
PATH: ~/.local/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin:/usr/lib/jvm/java-17-oracle/bin:/usr/lib/jvm/java-17-oracle/db/bin
XAUTHORITY: ~/.Xauthority
GDMSESSION: cinnamon
XDG_DATA_DIRS: /usr/share/cinnamon:/usr/share/gnome:~/.local/share/flatpak/exports/share:/var/lib/flatpak/exports/share:/usr/local/share:/usr/share:/var/lib/snapd/desktop
JAVA_HOME: /usr/lib/jvm/java-17-oracle
XDG_CONFIG_DIRS: /etc/xdg/xdg-cinnamon:/etc/xdg
XDG_SEAT_PATH: /org/freedesktop/DisplayManager/Seat0
DBUS_SESSION_BUS_ADDRESS: unix:path=/run/user/1000/bus
XDG_SESSION_TYPE: x11
XDG_SESSION_ID: c2
XDG_CURRENT_DESKTOP: X-Cinnamon
DISPLAY: :0
SESSION_MANAGER: local/host-Linux:@/tmp/.ICE-unix/3463,unix/host-Linux:/tmp/.ICE-unix/3463
CINNAMON_VERSION: 5.6.8
PWD: ~/some-path/video-detection-example
DERBY_HOME: /usr/lib/jvm/java-17-oracle/db
XDG_SESSION_CLASS: user
GJS_DEBUG_TOPICS: JS ERROR;JS LOG
SHELL: /bin/bash
GTK3_MODULES: xapp-gtk3-module
GIO_LAUNCHED_DESKTOP_FILE: ~/.local/share/applications/jetbrains-idea.desktop
XDG_GREETER_DATA_DIR: /var/lib/lightdm-data/username
J2SDKDIR: /usr/lib/jvm/java-17-oracle
DESKTOP_SESSION: cinnamon
GPG_AGENT_INFO: /run/user/1000/gnupg/S.gpg-agent:0:1
GIO_LAUNCHED_DESKTOP_FILE_PID: 5011
QT_ACCESSIBILITY: 1
GNOME_DESKTOP_SESSION_ID: this-is-deprecated
GJS_DEBUG_OUTPUT: stderr
XDG_SEAT: seat0
J2REDIR: /usr/lib/jvm/java-17-oracle
GTK_MODULES: gail:atk-bridge
SSH_AUTH_SOCK: /run/user/1000/keyring/ssh
GTK_OVERLAY_SCROLLING: 1
XDG_SESSION_PATH: /org/freedesktop/DisplayManager/Session0
QT_QPA_PLATFORMTHEME: qt5ct
XDG_RUNTIME_DIR: /run/user/1000
XDG_SESSION_DESKTOP: cinnamon
XDG_VTNR: 7
SHLVL: 0
HOME: ~

-------------- Directories --------------
temp directory: /tmp
DJL cache directory: ~/.djl.ai
Engine cache directory: ~/.djl.ai

------------------ CUDA -----------------
GPU Count: 1
CUDA: 122
ARCH: 86
GPU(0) memory used: 1167327232 bytes

----------------- Engines ---------------
DJL version: 0.23.0

----------------- Startup logs ---------------
Default Engine: PyTorch:1.13.1, capabilities: [
    CUDA,
    CUDNN,
    OPENMP,
    MKL,
    MKLDNN,
]
PyTorch Library: ~/.djl.ai/pytorch/1.13.1-SNAPSHOT-cu117-linux-x86_64
Default Device: gpu(0)
PyTorch: 2

KexinFeng commented 1 year ago

To debug this problem, the first observation from the output picture is that the bounding boxes are not at the right positions. So in the detection result DetectedObjects, you can check the x, y, w, h of the boundingBoxes to see if they are indeed wrong numbers. You can also in the same DetectedObjects check the classNames. After this, you will know if it is purely plotting problem (since the bounding boxes numbers should be compatible with the image size), or the postprocessing.

You can take a look at this example: /djl/examples/src/main/java/ai/djl/examples/inference/MaskDetection.java. It is similar to your usecase. Check how the images are rescaled, if needed. The relevant PR is #2452 which contains detailed .md file.

Also the first check might be the check on the onnx model. You can use python to load it and inference, and see what the correct boundingBoxes and classNames should be.

VojtenkoRN commented 1 year ago

1) I've added optArgument("optApplyRatio", true) and .optArgument("rescale", true) but it doesn't change anything.

debug0

2) As i said earlier, there were too many objects detected and all of them had wrong bounding boxes (result for video frame above)

debug1 debug2

3) I changed torchscript to onnx and installed TensorRT. I also rewrote code like your example (branch onnx-model). It throws File not found: ~/.djl.ai/cache/repo/model/undefined/ai/djl/localmodelzoo/51b9a350d1dd717c7e5d1c1c133e6afb8d7c12f0/synset.txt. I haven't found in docs how to specify synset for YoloV5TranslatorFactory. But if I put synset.txt into that path in djl.id/cache (which is rather uncomfortable) it throws UnsupportedOperationException: This NDArray implementation does not currently support this operation when trying to predict. Synset.txt: debug3

frankfliu commented 1 year ago

A few obvious issue:

First of all, .optArgument() only affects TranslatorFactory, you are directly passing Translator. In your case, you need configure your Translator directly:

        Translator<Image, DetectedObjects> translator = YoloV5Translator
              .builder()
              .setPipeline(pipeline)
              .optSynset(Synset.asNameList())
              .optThreshold(THRESHOLD)
              .optRescaleSize(IMAGE_SIZE, IMAGE_SIZE)
              .optApplyRatio(true)
              .optOutputType(YoloV5Translator.YoloOutputType.AUTO)
              .build();

Another small issue is you have the following code:

              .optDevice(Device.gpu())
              .optDevice(Device.cpu())

DJL will detect the GPU vs CPU automatically, in general you don't need specify device
The last call will override previous call, in your code only CPU will be used.

VojtenkoRN commented 1 year ago

Thanks for advice! I rewrote Translator as you said and changed model for using it on gpu (my bad). The problems still remain :(

Detected: debug0 debug1

On screen: debug3

UPD: All changes committed

frankfliu commented 1 year ago

I tried your model and it seems working fine:

Path imageFile = Paths.get("src/test/resources/dog_bike_car.jpg");
Image img = ImageFactory.getInstance().fromFile(imageFile);

Criteria<Image, DetectedObjects> criteria =
        Criteria.builder()
                .optApplication(Application.CV.OBJECT_DETECTION)
                .setTypes(Image.class, DetectedObjects.class)
                .optModelPath(Paths.get("video-detection-example/model/yolov5x.torchscript"))
                .optEngine("PyTorch")
                .optArgument("width", "640")
                .optArgument("height", "640")
                .optArgument("resize", "true")
                .optArgument("rescale", "true")
                .optArgument("optApplyRatio", "true")
                .optArgument("threshold", "0.4")
                .optArgument("synsetUrl", "https://djl-ai.s3.amazonaws.com/mlrepo/model/cv/object_detection/ai/djl/pytorch/classes_coco.txt")
                .optTranslatorFactory(new YoloV5TranslatorFactory())
                .optProgress(new ProgressBar())
                .build();

try (ZooModel<Image, DetectedObjects> model = criteria.loadModel();
    Predictor<Image, DetectedObjects> predictor = model.newPredictor()) {
    DetectedObjects detection = predictor.predict(img);
    saveBoundingBoxImage(img, detection);
    return detection;
}

VojtenkoRN commented 1 year ago

Ok, thanks a lot! I'll check it as best I can. But is there any other way to pass synset list except .optArgument("synsetUrl", "...") for using in .optTranslatorFactory(new YoloV5TranslatorFactory()) ? Because put synset.txt in ~/djl.id/cache/.../ is rather uncomfortable :(

frankfliu commented 1 year ago

@VojtenkoRN

Yes. you can use .optArgument("synset", "dog,cat,car,..."), but we didn't do any csv like comma escape, so it has limitation.

VojtenkoRN commented 1 year ago

Thank you all!

Option 1 (was):

Pipeline pipeline = new Pipeline()
              .add(new Resize(IMAGE_SIZE))
              .add(new ToTensor());

        Translator<Image, DetectedObjects> translator = YoloV5Translator
              .builder()
              .setPipeline(pipeline)
              .optSynset(Synset.asNameList())
              .optThreshold(THRESHOLD)
              .optRescaleSize(IMAGE_SIZE, IMAGE_SIZE)
              .optApplyRatio(true)
              .optOutputType(YoloV5Translator.YoloOutputType.AUTO)
              .build();

        Criteria<Image, DetectedObjects> criteria = Criteria.builder()
              .setTypes(Image.class, DetectedObjects.class)
              .optModelUrls(modelPath)
              .optModelName(modelName)
              .optDevice(Device.gpu())
              .optApplication(Application.CV.OBJECT_DETECTION)
              .optTranslator(translator)
              .optEngine(Engine.getDefaultEngineName())
              .optProgress(new ProgressBar())
              .build();

Option 2 (become, synset passed from code, master)

Criteria<Image, DetectedObjects> criteria = Criteria.builder()
              .setTypes(Image.class, DetectedObjects.class)
              .optModelUrls(modelPath)
              .optModelName(modelName)
              .optDevice(Device.gpu())
              .optApplication(Application.CV.OBJECT_DETECTION)
              .optEngine(Engine.getDefaultEngineName())
              .optArgument("width", IMAGE_SIZE)
              .optArgument("height", IMAGE_SIZE)
              .optArgument("resize", "true")
              .optArgument("rescale", "true")
              .optArgument("optApplyRatio", "true")
              .optArgument("threshold", THRESHOLD)
              .optArgument("synset", Synset.asString())
              .optTranslatorFactory(new YoloV5TranslatorFactory())
              .optProgress(new ProgressBar())
              .build();

Result: debug0 debug1 debug2

Option 3 (become, synset passed from txt, synset-in-txt)

final var synsetUrl = Path.of(synsetPath).toUri().toString();

        Criteria<Image, DetectedObjects> criteria = Criteria.builder()
              .setTypes(Image.class, DetectedObjects.class)
              .optModelUrls(modelPath)
              .optModelName(modelName)
              .optDevice(Device.gpu())
              .optApplication(Application.CV.OBJECT_DETECTION)
              .optEngine(Engine.getDefaultEngineName())
              .optArgument("width", IMAGE_SIZE)
              .optArgument("height", IMAGE_SIZE)
              .optArgument("resize", "true")
              .optArgument("rescale", "true")
              .optArgument("optApplyRatio", "true")
              .optArgument("threshold", THRESHOLD)
              .optArgument("synsetUrl", synsetUrl)
              .optTranslatorFactory(new YoloV5TranslatorFactory())
              .optProgress(new ProgressBar())
              .build();

Result (this workaround seems to be working): fine

But it seems a bit odd to me that option 1 and option 2 doesn't work, while option 3 does. Although outwardly there is not much difference between them. Especially when you consider that all classes seem to be defined correctly in option 2 debug3

Thanks for help again! If you think that it isn't a bug - you can close that issue :)

frankfliu commented 1 year ago

I don't see any difference between all options, I just test with the following code, and works fine:

        String url = "https://djl-ai.s3.amazonaws.com/mlrepo/model/cv/object_detection/ai/djl/pytorch/classes_coco.txt";
        List<String> list;
        try (InputStream is = new URL(url).openStream()) {
            list = Utils.readLines(is);
        }
        String synset = String.join(",", list);

and:

        String url = "https://djl-ai.s3.amazonaws.com/mlrepo/model/cv/object_detection/ai/djl/pytorch/classes_coco.txt";
        List<String> list;
        try (InputStream is = new URL(url).openStream()) {
            list = Utils.readLines(is);
        }

        Pipeline pipeline = new Pipeline()
                .add(new Resize(640))
                .add(new ToTensor());

        Translator<Image, DetectedObjects> translator = YoloV5Translator
                .builder()
                .setPipeline(pipeline)
                .optSynset(list)
                .optThreshold(0.4f)
                .optRescaleSize(640, 640)
                .optApplyRatio(true)
                .optOutputType(YoloV5Translator.YoloOutputType.AUTO)
                .build();

deepjavalibrary / djl