VisualComputingInstitute / TrackR-CNN

TrackR-CNN baseline method for Multi-Object Tracking and Segmentation (MOTS)
MIT License
520 stars 110 forks source link

PNG annotation switch #103

Closed bat3a closed 3 years ago

bat3a commented 3 years ago

hi

the code currently is demanding a dataset with png masks as an annotation, is there a way to switch to the text format annotation ?, i checked the configuration file and didn't find a way to do it.

ahnonay commented 3 years ago

Hey,

unfortunately, we did not implement that... The .txt files are mainly used for evaluation in mots_tools. You can probably write a small script that takes your .txt annotations and converts them to .png. Checkout the description of the .png Format on https://www.vision.rwth-aachen.de/page/mots. Also, maybe https://github.com/VisualComputingInstitute/mots_tools/blob/master/mots_common/io.py will be helpful to write the script.

rcluan commented 3 years ago

Hey @ahnonay . I'm doing as you suggested and I'm following the format described in here. However, the output I get in the txt file is a bit different than the one described in the website. I'm getting this:

0 558.31946 195.86006 677.483 301.47534 0.9999932 1 370 1226 PkY6Y2Y90000000O1O1O1O1N2O1O100O1O1O1O1G08N200010O1O1O001L40000000000000000000000000000000000000000000000000000000000000000000000000000000001O0000000000000000000000000000000000000000000000000004L1O1O1O0010000O3O2N0O00000003MO1O6J6J5K1O00000000[TV6 0.118669435 -0.61507374 0.2891215 -0.33546203 -0.6016488 -0.21526036 -0.15051363 0.3291249 0.30240032 0.47418585 -0.36721733 -0.18391787 0.4216933 0.01718544 0.39470667 0.90973204 0.37792426 -0.46535897 -0.4390211 0.30799013 -0.14245921 -0.025735103 -0.097483434 0.19497302 0.62773395 -0.39707923 -0.34288993 -1.4866549 0.9546847 0.4633885 -1.0617515 -0.09642815 -0.047259353 0.5810729 0.14741234 -1.2716718 -0.33576217 0.106442526 -0.39653045 0.27586588 -0.5601489 0.0238905 0.799424 -0.36105844 -0.23008516 0.17858939 -0.32880503 -1.3416905 0.023993794 -0.6533784 0.448367 -0.31353286 -0.104037255 -0.21043772 -1.0387528 -0.41396764 0.1815714 -0.75293815 0.23897424 -0.0372747 0.36785004 0.16856545 -0.4453486 -0.506657 -0.75055194 0.5673235 0.11616806 -0.16810815 -0.9602316 -0.44271952 -0.008757354 -0.70901984 -0.7628863 -1.0193002 0.4912163 -0.29878622 0.92913646 -0.5191376 -0.10828586 0.21010937 0.6459578 -0.06947151 0.51483685 0.4591744 0.27479383 -0.014261955 0.70596856 0.21813771 0.43948525 -1.0382681 -0.7083508 -0.19855294 -0.63157976 -0.2288772 -0.41278908 -0.15812272 -0.63427585 0.2419031 -0.49120593 -0.5622424 0.2668778 0.3223468 -0.58184296 0.21222095 -0.4803191 0.17560856 0.533303 0.8797096 -0.43620965 0.2795726 0.4719348 0.4989669 -0.48367968 0.8558247 0.016128203 -0.39784795 0.395054 0.4069285 0.6018618 -0.28695956 0.2659238 -0.19691044 -0.09037606 -1.4016734 0.59852856 0.36671048 -0.38314664 -0.27809626

I can see some attributes as indicated in the website (time_frame, rle, width, height, class id), but the id is not clear to me. Also, the other float numbers seem to be a bit off. Do you have a any clue on what can be happening? I'm running the forwarding command described in the README.md.

ahnonay commented 3 years ago

Hey, the output you are showing here is a detection result, not a tracking result. So the format is time_frame bbox class_id img_height img_width rle association_embedding and, so, one line corresponds to one detected object. You still need to run tracking (as described in the readme) and then you will have the output as described on the page you linked (including the ids of the tracks).

ahnonay commented 3 years ago

Closing for now, please reopen if needed