JohnSnowLabs / nlu

1 line for thousands of State of The Art NLP models in hundreds of languages The fastest and most accurate way to solve text problems.
Apache License 2.0
851 stars 131 forks source link

How to set the batch size? #61

Open atoutou opened 3 years ago

atoutou commented 3 years ago

Hi,

The prediction process takes a long time to finish so I check the GPU memory usage and find out it only uses 3GB memory ( I have 16GB memory GPU). I want to set a larger batch size to speed up the process but I can't find the argument. How to set the batch size when using the predict function?

import nlu pipe = nlu.load('xx.embed_sentence.labse', gpu=True) pipe.pipe.predict(text, output_level='document')

Thanks

C-K-Loan commented 3 years ago

Hi @atoutou

pipe = nlu.load('xx.embed_sentence.labse', gpu=True)
pipe.print_info()

will print

The following parameters are configurable for this NLU pipeline (You can copy paste the examples) :
>>> pipe['bert_sentence@labse'] has settable params:
pipe['bert_sentence@labse'].setBatchSize(8)          | Info: Size of every batch | Currently set to : 8
pipe['bert_sentence@labse'].setIsLong(False)         | Info: Use Long type instead of Int type for inputs buffer - Some Bert models require Long instead of Int. | Currently set to : False
pipe['bert_sentence@labse'].setMaxSentenceLength(128)  | Info: Max sentence length to process | Currently set to : 128
pipe['bert_sentence@labse'].setDimension(768)        | Info: Number of embedding dimensions | Currently set to : 768
pipe['bert_sentence@labse'].setCaseSensitive(False)  | Info: whether to ignore case in tokens for embeddings matching | Currently set to : False
pipe['bert_sentence@labse'].setStorageRef('labse')   | Info: unique reference name for identification | Currently set to : labse
>>> pipe['deep_sentence_detector@SentenceDetectorDLModel_c83c27f46b97'] has settable params:
pipe['deep_sentence_detector@SentenceDetectorDLModel_c83c27f46b97'].setExplodeSentences(False)  | Info: whether to explode each sentence into a different row, for better parallelization. Defaults to false. | Currently set to : False
pipe['deep_sentence_detector@SentenceDetectorDLModel_c83c27f46b97'].setStorageRef('SentenceDetectorDLModel_c83c27f46b97')  | Info: storage unique identifier | Currently set to : SentenceDetectorDLModel_c83c27f46b97
pipe['deep_sentence_detector@SentenceDetectorDLModel_c83c27f46b97'].setEncoder(com.johnsnowlabs.nlp.annotators.sentence_detector_dl.SentenceDetectorDLEncoder@7f47d7d6)  | Info: Data encoder | Currently set to : com.johnsnowlabs.nlp.annotators.sentence_detector_dl.SentenceDetectorDLEncoder@7f47d7d6
pipe['deep_sentence_detector@SentenceDetectorDLModel_c83c27f46b97'].setImpossiblePenultimates(['Bros', 'No', 'al', 'vs', 'etc', 'Fig', 'Dr', 'Prof', 'PhD', 'MD', 'Co', 'Corp', 'Inc', 'bros', 'VS', 'Vs', 'ETC', 'fig', 'dr', 'prof', 'PHD', 'phd', 'md', 'co', 'corp', 'inc', 'Jan', 'Feb', 'Mar', 'Apr', 'Jul', 'Aug', 'Sep', 'Sept', 'Oct', 'Nov', 'Dec', 'St', 'st', 'AM', 'PM', 'am', 'pm', 'e.g', 'f.e', 'i.e'])  | Info: Impossible penultimates | Currently set to : ['Bros', 'No', 'al', 'vs', 'etc', 'Fig', 'Dr', 'Prof', 'PhD', 'MD', 'Co', 'Corp', 'Inc', 'bros', 'VS', 'Vs', 'ETC', 'fig', 'dr', 'prof', 'PHD', 'phd', 'md', 'co', 'corp', 'inc', 'Jan', 'Feb', 'Mar', 'Apr', 'Jul', 'Aug', 'Sep', 'Sept', 'Oct', 'Nov', 'Dec', 'St', 'st', 'AM', 'PM', 'am', 'pm', 'e.g', 'f.e', 'i.e']
pipe['deep_sentence_detector@SentenceDetectorDLModel_c83c27f46b97'].setModelArchitecture('cnn')  | Info: Model architecture (CNN) | Currently set to : cnn
>>> pipe['document_assembler'] has settable params:
pipe['document_assembler'].setCleanupMode('shrink')  | Info: possible values: disabled, inplace, inplace_full, shrink, shrink_full, each, each_full, delete_full | Currently set to : shrink

With pipe['bert_sentence@labse'].setBatchSize(8) | Info: Size of every batch | Currently set to : 8

Should fix your problem.

Let me know if it helps