investigate possibility of generating a synth dataset

Armandpl commented 10 months ago

lightly annoyed that the vpgnet dataset is/seems to be imbalanced. annoyed i don't have easy access to the DeepVP-1M dataset. Can I make my own dataset?

[x] generate new image using controlnet and and canny edges
[ ] generate lane lines canny using blender then gen using controlnet
[x] try sdxl turbo to make it faster
[ ] make simple blender scene with 4 lane lines, render edges, gen with sdxl and see
[ ] add more stuff in the blender scene
- cars/trucks (get very simple models)
- randomly double/delete lane lines, make them dotted
- automate the camera to get a bunch of different angles

Armandpl commented 10 months ago

is using controlnet actually useful? can't we just train on canny edges from blender generated images? if we do that we won't get the perturbation from the env such as other cars, bridges etc unless we add them in blender which requires hard-coding "scenarios" whereas if we use controlnet we can easily and "organically" add those. If we only train on edges from blender we probably won't generalize very well. Tho maybe we could view adding bridges, cars etc as adding noise and maybe there is a simpler faster way to add noise to images. but it would be a different type of noise i guess

Armandpl commented 10 months ago

ezgif com-animated-gif-maker doesn't look like i can add cars and other stuff, its closely follows the canny edges (which are from segmented lane lines) takes 11 seconds to gen 1 image, that's 30 hours for 10k images at 1152x832 vanishing point seems to remain consistent across images

Armandpl commented 10 months ago

4 steps using turbo = 2s per image, quality seems to be okay but less good?

ezgif com-animated-gif-maker(1)

Armandpl commented 10 months ago

Make the blender scene:

should have 1-4 lane lines. We should have as much images with 1, 2, 3 and 4 lane lines
at each step each lane line is randomly dotted or not
at each step the camera is randomly offset from the origin, left/right -2 to 2m, up/down 0.5 to 2m
at each step the camera is randomly panned/tilted -30 +30 deg
later: randomize the lane lines position wrt each other, randomize their width

Armandpl commented 10 months ago

which focal len to use in blender:

sensor (imx298) size: 6.521 mm diagonal
width: 5.21472
height: 3.91552
image size: 874x1184
focal len: 910 mm

910*6.521/1471.64262 = 4 mm focal len

Armandpl commented 10 months ago

actually do we really need blender? could just use a perspective projection

Armandpl commented 10 months ago

ezgif com-animated-gif-maker (1) looks like it could fly. my main concern now is that there is no car on those images. not sure how to add them. main point of using stable diffusion was to try and distill what it knows about the world. though maybe we could make it work by using random erasing? make it robust to big perturbations? though random black rectangles are a bit different from cars. maybe automatic inpainting? but that would make generating the images even longer.

Armandpl commented 10 months ago

we also need to add dotted lines, maybe double lines. maybe add curved lines later? though unsure what the vp definition is in those case, let's not worry about it

Armandpl commented 10 months ago

save gen script config somewhere for repro

Armandpl commented 10 months ago

[x] understand why the camera isn't in the center of the lane
- weird stuff happen when projecting points that are behind the camera which makes perfect sense
- but also we should be able to avoid it
- broadly the problem is that we project the start/end of the lines wheras in real life each point of the line is projected onto the sensor. in our case if the end is outside of the fov we dont see the line. maybe we should break down the lines into smaller segments?
- quick fix is restricting the camera movement so that points are never behind it
- better solution is: if a point is behind the camera compute the intersection between the point and the camera near plane
[x] quick fix: don't display points behind camera and don't pan/tilt too much
[x] make this into a script to gen the images. also make it so we can either viz or gen
[ ] make it so we can config script w/ hydra
[x] randomize
- [x] number of lines, 1-6, make sure at least one is shown
- [x] dotted or not
- [ ] ~~lane width? line width?~~ later
[x] see how long it would take to make 10k images (probably pretty quick?)
[ ] make sdxl diffusion script and gen
[ ] project vp on diffused images, make sure ok

Armandpl commented 10 months ago

Could we go from vp dataset to direction/delta pose dataset this way? if we move the camera forward, does the vanishing point changes? What I'm trying to guess is: is there enough info in the lines to predict the direction of travel? Look at it from above and think about it

Armandpl / quick_calib

investigate possibility of generating a synth dataset #4