Open Punchwes opened 3 years ago
Hi @Punchwes Yes, you would need to modify whole pipeline so that your input feature can be used.
If you have a small (discrete) number of features, you can add them as text to your input.
E.g. You have the features: [guest] vs [user] [male] vs [female] vs [unknown] [america] vs [europe] vs [asia]
Then your input can look like: [guest] [male] [america] I love this song! [user] [unknown] [asia] Me too, this song is great
Hi, thanks for sharing this library. My current scenario is besides the traditional BERT input (input_ids, attention_mask, token_type_id), I will be having another feature dict which looks like: {'new_feature': [1,1,1,1,2,3,4,5]}. So the input in my case will be like: {'input_ids': [x,x,x,x,x], 'attention_mask': [x,x,x,x,x,x], 'token_type_id': [x,x,x,x,x,x], 'new_feature': [x,x,x,x,x,x]}
The most straightforward way I can think of is to modify the dataloader, but it seems that the whole model pipeline only accepts the
texts
, so I need to modify the input to the pipeline as well which seems to be quite complex. The other potential way is to further process the input text to extract these features in thetokenize()
function and update the output as well, but it might make the whole process very slow. Wonder is there a simple/straightforward way to achieve this from your perspective?Best