cisco / mindmeld

An Open Source Conversational AI Platform for Deep-Domain Voice Interfaces and Chatbots.
http://mindmeld.com
Apache License 2.0
677 stars 186 forks source link

Support FileBackedList for PyTorch-CRF #417

Open vrdn-23 opened 2 years ago

vrdn-23 commented 2 years ago

The new torch-crf implementation does not currently support storing CRF features on disk. This option would be beneficial for users who do not have a larger memory threshold.

In order to implement this successfully, we would have to mainly re-implement the scikit-learn train_test_split function used in pytorch_crf.py. This sounds like a good idea to me for two main reasons:

It makes sense to have a separate PR for this as there are a lot of moving parts in the current PR, and this would be better evaluated as a stand alone change as we would also need to implement an efficient file-seeking mechanism for a file backed CRF feature.