ThilinaRajapakse / simpletransformers

Transformers for Information Retrieval, Text Classification, NER, QA, Language Modelling, Language Generation, T5, Multi-Modal, and Conversational AI
https://simpletransformers.ai/
Apache License 2.0
4.07k stars 727 forks source link

Is it possible to mimic reprocess=True outside simple transformers itself #276

Closed suryapa1 closed 4 years ago

suryapa1 commented 4 years ago

Describe the bug Is it possible to mimic reprocess=True outside simple transformers itself

Any code sample is appreciated !!!

I want to reprocess =True before feeding into model train method, I want to process raw data using normal NLP cleansing process like lowercase, stemming, special characters treatments etc, What others extra steps can be additionally be done by enabling reprocess=True, Just I want to do those steps outside simple transformers itself.

Also, PLease recommend is it make sense to apply reprocess=True in general as am doing cleansing portion in general, here also I want to save some latency here

ThilinaRajapakse commented 4 years ago

reprocess_input_data controls whether the words are converted into features from scratch or whether cached features (from a previous run) should be loaded from disk. It's not pre-processing in the sense of lowercase, stemming, etc. (I also wouldn't recommend doing manual stemming with transformers, just use the raw text).

You can do any preprocessing outside simple transformers. Just do the preprocessing before you use any simple transformers classes/methods.

If your dataset doesn't change, you don't need to do reprocessing.

suryapa1 commented 4 years ago

Thanks