Closed johann-petrak closed 5 years ago
See https://github.com/GateNLP/gate-lf-python-data/issues/23 This has now be implemented so that initial settings in the feature specification file can be added. However, the LF completely ignores this for sparse representations and does not shorten anything itself for dense representations.
This can be crucial if we use the deep learning backend. Ideally it should be possible to limit this in the feature specification (this can reduce the initial dataset size), then limit even more in the pytorch backend through a parameter (for further experimenting).
For single feature datasets, sorting the training set by sequence length would be a good alternative to avoid excessive padding.
See https://github.com/GateNLP/gate-lf-python-data/issues/23