Closed suvam-bag closed 4 years ago
Hello @suvam-bag ! At this moment we don't have such function. However, i think this could be quite useful ! From the client side, It could be implemented from the data we already have. The client receives the camera pose , FoV and the positions of all the objects present in the scene. With this, one could compute a function that projects the camera view on the 3D space and get all the visible objects.
This is also possible to be done from server side. The rendering engine already needs to get the visible objects, so this should be low cost. One could investigate how to do that with unreal.
Currently we are working on other things. Help from the community would be useful!
Thanks @felipecode for the detailed answer! I will try to create a mapping function between the image pixel and the position of the ego-vehicle in the client side.
@suvam-bag. Were you able to create the mapping? If you did, could you help me with it?
@felipecode am I correct in understanding that the code to generate semantic segmentation images is written by the Carla team. If so, is there some way for a simple tweak in that algorithm to create instance segmentation?
I basically want to test a Faster-RCNN for object detection in Carla. Now to get the ground truth bounding boxes I can have 2 approaches:
Somehow using the given camera data and the 3D position, orientation and size of the object, I project the bounding box onto my camera view. But this requires some knowledge of perspective mapping and other CV topics. Still, it wont really work well since the Z axis length of the bounding box is not correct in the given data.
Use the Semantic Segmentation to create a bounding box for each type of class in the scene. This is really easy to implement where your topX=minX , topY = minY, W=(maxX - minX) and H=(maxY - minY) but you should have only one instance of the class in the scene for this kind of bounding box creation to be done. This would work easily if we were provided with instance segmented images instead.
In the second scenario,even with the SemSeg images if objects were far away, it would be possible to create the ground truth bounding boxes by just checking if the two blobs were connected (Again, we would require knowledge of CV for that). But it would completely fail if the Objects were intersecting in the SemSeg.
Anyway, If you could suggest me on a way I could create the bounding box of the objects in camera FOV, it would really be helpful.
@nsubiron in #242 I think I didn't frame my question well but what I wanted was not just filtering out of the objects in the scene but to get their information with respect to the scene. Example the tpX, tpY, W, H of bounding box of objects in scene in pixels.
@nsubiron @felipecode , are there any plans to enable instance segmentation?
I was able to get the bounding boxes as long as the pixels of the instances of a class are not intersecting. But I still can't get bounding boxes correctly if the pixels are intersecting.
Hello @YashBansod Did you manage to get the bounding box information for the objects only present in the scene ? I am trying to do the same as your task, using an object detector to detect the objects, so I need the GT info of the bounding boxes
I really tried to dig through this a lot, but I can't manage to do it
If you have any hints of how to do it, or part of code/function that can be added to the client example to log this info, it would really be appreciated
@nsubiron @felipecode any help here would be nice
Hi @mhusseinsh,
I am trying to solve the same problem. Did you find a solution to it? please let me know?_
The client receives the camera pose , FoV and the positions of all the objects present in the scene. With this, one could compute a function that projects the camera view on the 3D space and get all the visible objects.
hi @felipecode, I'm deeply interested in what you're specifying above. Could you please elaborate on how this is done? Thank you!
Problem solved in current versions of CARLA.
@germanros1987 where do you see that this was solved?
Can you please elaborate on how this was solved?
Is there a way to get the ground truth of the objects only within the FoV of the camera?