egemenkopuz / temporal-bevfusion

Master's thesis research on 3D object detection using LiDAR and Camera data for infrastructure and railway domains, emphasizing inference optimization and utilization of temporal information for distant and occluded objects.
11 stars 1 forks source link

Single view input #13

Open lacie-life opened 1 month ago

lacie-life commented 1 month ago

Thank you very much for making this code available.

I am testing your code, and I want to ask if your model is runnable with a single image input?

Thank you very much !!!

egemenkopuz commented 1 month ago

Yes, monocular camera + LiDAR should work.

FYI if you are trying tumtraf dataset, some of the preprocessing files need to be refactored as official dataset's layout has been changed... I just couldn't find a time to have a look at it.

lacie-life commented 1 month ago

Thank you so much for your quick reply. I will try.

By the way, I want to ask about dividing data files. Then I checked on the GitHub page of the tum traf dataset to see if it has an official train/val image set. The number of trains/val is 1920/240, the same as in your thesis. So, is the result in your thesis based on this division or on another ratio?

Thank you!

egemenkopuz commented 1 month ago

The official splits and mine differed because I was experimenting solely with temporal splits. As a result, my generated splits consisted of consecutive sequences 10 to 25 frames long. I see that I forgot to push my split details, sorry for that --need to find them :). If official splits include such consecutive frames then I say it is compatible with this codebase.

lacie-life commented 1 month ago

Sorry to bother you again. I am running your code with the TUMTraf-I dataset, I have some questions: