fastnlp / fastNLP

fastNLP: A Modularized and Extensible NLP Framework. Currently still in incubation.
https://gitee.com/fastnlp/fastNLP
Apache License 2.0
3.05k stars 451 forks source link

Append instances to DataBundle. #336

Closed ChawlaAvi closed 3 years ago

ChawlaAvi commented 3 years ago

Hello

I am working on CoNLL dataset and and to load the dataset, I make use of the Conll2003Loader class and then some further processing to get the data in the following shape:

+------------------------+------------------------+------------------------+---------+ | raw_words | target | words | seq_len | +------------------------+------------------------+------------------------+---------+ | ['What', 'kind', 'o... | [0, 0, 0, 0, 0] | [207, 241, 5, 2775,... | 5 | | ['We', 'respectfull... | [0, 0, 0, 0, 0, 0, ... | [114, 9870, 4654, 1... | 13 | | ['WW', 'II', 'Landm... | [35, 23, 23, 23, 23... | [12946, 2359, 27072... | 15 | | ['Standing', 'tall'... | [0, 0, 0, 32, 33, 0... | [4132, 4807, 17, 11... | 14 | | ['It', 'is', 'compo... | [0, 0, 0, 0, 0, 0, ... | [68, 12, 6155, 5, 8... | 28 | | ['A', 'primary', 's... | [0, 0, 0, 0, 12, 0,... | [134, 1541, 16869, ... | 13 | | ['The', 'Hundred', ... | [45, 29, 29, 46, 0,... | [21, 4438, 4808, 46... | 25 |

The "target" denotes the NER labels, "words" column gives me the word2idx mapping obtained using the _indexize function. The data object here belongs to the DataBundle class.

Now assume that I have one more similar DataBundle with the exact same columns as above but different entries in the row. I wish to combine them into a single DataBundle. How can I do that?

ChawlaAvi commented 3 years ago

Closing the issue as I have got the solution. Just came across the append function in the code.