Nonzero->tranpose->gather/gatherND interaction

ROCm / AMDMIGraphX

AMD's graph optimization engine.

https://rocm.docs.amd.com/projects/AMDMIGraphX/en/latest/

MIT License

168 stars 76 forks source link

Nonzero->tranpose->gather/gatherND interaction #1886

Open TedThemistokleous opened 1 year ago

TedThemistokleous commented 1 year ago

The reccuring pattern of Nonzero->transpose->gather/gatherND seems to be popping up in a few different networks.

his combination of operators is prefixed by some sort of boolean or shape operation in order to gather indices that meet a condition (equal, greater,less)

An issue arises here in the static shape case as nonzero is currently padded with zeros to the full shape size, as we assume the largest possible output shape based on input so we can correctly call compute_shape() . Gather will interpret the padded results as valid indices and as a result, gather on the appropriate data axes.

The current networks this has been seen are:

retinanet
pt_pointpillars

In the retinanet case, these resulting gathers are used to feed a topk and then are concated effecting accuracy

For point pillars we see the similar structure here:

TedThemistokleous commented 1 year ago

@umangyadav I do have a branch that can "track" how much data is in the nonzero by modifying how we do padding but that would require that at runtime we perform other operands dynamically.

@CharlieL7 is this something we should be looking at dynamic shape soley to fix then?

CharlieL7 commented 1 year ago

This usage is a subset of dynamic shapes as a whole. NonZero and TopK are operators with data-dependent shape functions (the output shape changes depending on the data supplied to the operator). If zero-padding the output of these operators with data-dependent shape functions is not valid for the rest of the model, the only reasonable way to make all the models work is by extending dynamic shape support. We're currently only working on dynamic batch support. Supporting these models should probably be the next step.

TedThemistokleous commented 1 year ago

Branch for the nonzero has been pushed up from the changes from earlier if we have any use for these.

nonzero_track_data_position

umangyadav commented 1 year ago

GPT-J model also showed up empty literal. It errored on CONCAT.

https://github.com/huggingface/transformers/blob/v4.30.0/src/transformers/models/gptj/modeling_gptj.py#L831-L845