Closed nikky4D closed 9 months ago
Dear Nikky,
Yes with the above setup, memory locations would overlap and it would cause corruption: both location and classification outputs should stay clean for latter NMS processing.
64x80 and wide output fill up whole data memory instance: (data memory instance size of 81920 = 0x14000): 64x80x4x4 => 0x00000 - 0x14000 meaning you will overwrite if you use the same quadrant for any other layer's processing.
As you are having 2 classes, you may consider moving 64x80 resolution layer output to another data memory instance that can be used in parallel (for 64x80 res: output_processors: 0x000000000fff0000, for others: output_processors: 0x0000000000000fff)
For proper modification, you need to:
Modify NMS code properly
Details: Noting that I have not went over the whole yaml file and have not cross-check all potentially problematic layers in terms of overlap (fitting whole layers may still not be guaranteed), to provide an alternative starting point and roughly speaking, updated memory map could then be like:
# Class predictions:
However above changes are not solely sufficient: for these kind of changes, FPN detector layers' feature outputs (and their prior dependants: enc and skip layers) also have to be analysed as they should not be overwritten until classification and regression layers use those either (Please also go through layers like 16, 20, 24, 28, 31, 32, 34, 37, 40, 43 for especially managing high res intermediate outputs.
After yaml and memory mapping is verified, NMS code should also be modified for reading proper class and location outputs from valid memory locations. Location outputs reading function just need memory address update but you should go over class predictions as it assumes all quadrants are active and classification outputs reside in the same quadrant
Thank you for very much for this response. It is very helpful.
Could you expand more on this parallel usage here:
As you are having 2 classes, you may consider moving 64x80 resolution layer output to another data memory instance that can be used in parallel (for 64x80 res: output_processors: 0x000000000fff0000, for others: output_processors: 0x0000000000000fff)
For example: if I move the FPN output for 64x80 res to another set of output processors, how would I specify its location to the classification/regression layer that needs it meaning what processors are set: eg:
- in_sequences: FPN_out_64_80
in_offset: 0xF0A0
out_offset: 0x10AE0
processors: 0xffffffffffffffff ##<-- Does this change?
output_processors: 0xffffffffffffffff
operation: conv2d
kernel_size: 3x3
pad: 1
activate: ReLU
write_gap: 1
name: loc_64_80_res0_preprocess
weight_source: loc_64_80_res0_preprocess
This issue has been marked stale because it has been open for over 30 days with no activity. It will be closed automatically in 10 days unless a comment is added or the "Stale" label is removed.
Hi, I am working on the he fpn detector example, and would like to know if I modify my FPN so that I use the 64x80 output, and drop the 4x5 output, what is the best way to setup the classification and regression memory location for this higher filter set.
For example, the comments show a mapping of the memory:
I am reworking this for my own scenario of dropping the 4x5 but using the 64x80 and where I have only 2 classes and same filter shapes like this:
I set the out_offset of largest classification output to 0x0000. Then out_offset of 32x40 to 2800, and so on.
While this setup synthesizes, would it cause memory overwrites/corruption of the 4 outputs since the memory locations overlap?