PRBonn / semantic-kitti-api

SemanticKITTI API for visualizing dataset, processing data, and evaluating results.
http://semantic-kitti.org
MIT License
783 stars 187 forks source link

generate_sequential output not correct #37

Closed munib94 closed 4 years ago

munib94 commented 4 years ago

Hi,

I used the generate_sequential.py file to aggregate a scan with the previous 4 scans and the output I get has the same number of files as the original dataset. I assumed it would be less since the scans are aggregated, or is my assumption wrong?

jbehley commented 4 years ago

We aggregate each scan with its previous scans, therefore we get for N scans, N aggregated scans. For the first M scans, where M is the number of scans aggregated, we combine 1, 2, ..., M-1, M scans.

Hope the explanation helps.

munib94 commented 4 years ago

So when I train the network using the aggregated scans, how does the network recognize that the scans are aggregated when as you say we get for N scans, N aggregated scans? I understand how it would work if for N scans the output is N/M scans, where M is the number of scans aggregated, but I don't understand how it works in the former case.

jbehley commented 4 years ago

Sorry for the delay, but there was some other more urgent stuff. I hope I can catch up.

Let's say you take the scans 1245 and the 4 scans before (1245, 1244, 1243, 1242, 1241), then you predict for each point a label and we want the labels for 1245 as result. The network should not care about the number of points, since we always want labels for all points. Thus, you would infer for the aggregated points cloud of scans (1245, 1244, 1243, 1242, 1241) and then throw away the predictions for 1244, 1243, 1242, 1241 and report only the predictions for 1245.

hope that helps.

munib94 commented 4 years ago

That makes sense, but how does the network know to throw away the predictions for 1244, 1243, 1242, and 1241? I assume as long as I use the the aggregated scans, the network does the rest? There does not seem to be an argument that the user can invoke when running training.

jbehley commented 4 years ago

it does not know this. Let's say the scans have 3,4,2,5,2 points. the aggregated point cloud is just the concatenation of these points, i. e., has 16 points. for these points, the network produces in the same order labels. Thus, you reshape the result to 3 and have the desired result (predictions for the first scan.)

munib94 commented 4 years ago

If I throw away the predictions for the previous 4 scans, do I also ignore the scan files and label files for the previous 4 scans when I do evaluation? Otherwise I will get an assertion error when I try to evaluate the IoU scores because len(label_names) will not equal len(scan_names))and similarly len(label_names) will not equal len(pred_names)).

jbehley commented 4 years ago

your predictions for the multiple scan case must have as many labels as the single scan.

okay, hope this very explicit example helps:

scan 1: a b c scan 2: d e f g scan 3: h i

Aggregated scans with N = 2 ( ie one scan history):

1: a b c 2: d e f g a b c 3: h i d e f g

Now you have to predict for each point x a label x', i.e.,

1: a' b' c' 2: d' e' f' g' a' b' c' 3: h' i' d' e' f' g'

To get the final result we want you throw away the concatenated labels, i.e., you want to get:

1: a' b' c' 2: d' e' f' g' 3: h' i'

as predictions.

No scan, no points. no labels are removed. Only the added parts.

Does this help?

munib94 commented 4 years ago

This helps. Thanks for the prompt reply. So this would mean that I would need to open each predicted label and manually remove the added parts?

jbehley commented 4 years ago

yes or don't save it. Whatever works for you.

munib94 commented 4 years ago

Thank you for your help!