JohnSnowLabs / nlu

1 line for thousands of State of The Art NLP models in hundreds of languages The fastest and most accurate way to solve text problems.
Apache License 2.0
861 stars 130 forks source link

error while using biobert PubMed PMC #154

Open aozorahime opened 1 year ago

aozorahime commented 1 year ago

Hi, I am totally interested in this NLU biobert library. its totally easy to implement yet understandable. However, I faced difficulties while to use this NLU biobert for my project. So I wanna run this code:

`import nlu

embeddings_df2 = nlu.load('en.embed.biobert.pubmed_pmc_base_cased', gpu=True).predict(df['text'], output_level='token') embeddings_df2`

I am using google colab with GPU. After approximately 40 mins, its suddenly stopped and resulted the error

biobert_pubmed_pmc_base_cased download started this may take some time. Approximate size to download 386.7 MB [OK!] sentence_detector_dl download started this may take some time. Approximate size to download 354.6 KB [OK!]

Exception happened during processing of request from ('127.0.0.1', 40522) ERROR:root:Exception while sending command. Traceback (most recent call last): File "/usr/local/lib/python3.7/dist-packages/py4j/java_gateway.py", line 1207, in send_command raise Py4JNetworkError("Answer from Java side is empty") py4j.protocol.Py4JNetworkError: Answer from Java side is empty

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/usr/local/lib/python3.7/dist-packages/py4j/java_gateway.py", line 1033, in send_command response = connection.send_command(command) File "/usr/local/lib/python3.7/dist-packages/py4j/java_gateway.py", line 1212, in send_command "Error while receiving", e, proto.ERROR_ON_RECEIVE) py4j.protocol.Py4JNetworkError: Error while receiving Traceback (most recent call last): File "/usr/lib/python3.7/socketserver.py", line 316, in _handle_request_noblock self.process_request(request, client_address) File "/usr/lib/python3.7/socketserver.py", line 347, in process_request self.finish_request(request, client_address) File "/usr/lib/python3.7/socketserver.py", line 360, in finish_request self.RequestHandlerClass(request, client_address, self) File "/usr/lib/python3.7/socketserver.py", line 720, in init self.handle() File "/usr/local/lib/python3.7/dist-packages/pyspark/accumulators.py", line 268, in handle poll(accum_updates) File "/usr/local/lib/python3.7/dist-packages/pyspark/accumulators.py", line 241, in poll if func(): File "/usr/local/lib/python3.7/dist-packages/pyspark/accumulators.py", line 245, in accum_updates num_updates = read_int(self.rfile) File "/usr/local/lib/python3.7/dist-packages/pyspark/serializers.py", line 595, in read_int raise EOFError EOFError

ERROR:py4j.java_gateway:An error occurred while trying to connect to the Java server (127.0.0.1:35473) Traceback (most recent call last): File "/usr/local/lib/python3.7/dist-packages/nlu/pipe/pipeline.py", line 438, in predict self.configure_light_pipe_usage(data.count(), multithread) File "/usr/local/lib/python3.7/dist-packages/pyspark/sql/dataframe.py", line 585, in count return int(self._jdf.count()) File "/usr/local/lib/python3.7/dist-packages/py4j/java_gateway.py", line 1305, in call answer, self.gateway_client, self.target_id, self.name) File "/usr/local/lib/python3.7/dist-packages/pyspark/sql/utils.py", line 128, in deco return f(*a, **kw) File "/usr/local/lib/python3.7/dist-packages/py4j/protocol.py", line 336, in get_return_value format(target_id, ".", name)) py4j.protocol.Py4JError: An error occurred while calling o1231.count

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/usr/local/lib/python3.7/dist-packages/py4j/java_gateway.py", line 977, in _get_connection connection = self.deque.pop() IndexError: pop from an empty deque

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/usr/local/lib/python3.7/dist-packages/py4j/java_gateway.py", line 1115, in start self.socket.connect((self.address, self.port)) ConnectionRefusedError: [Errno 111] Connection refused Exception occured Traceback (most recent call last): File "/usr/local/lib/python3.7/dist-packages/nlu/pipe/pipeline.py", line 438, in predict self.configure_light_pipe_usage(data.count(), multithread) File "/usr/local/lib/python3.7/dist-packages/pyspark/sql/dataframe.py", line 585, in count return int(self._jdf.count()) File "/usr/local/lib/python3.7/dist-packages/py4j/java_gateway.py", line 1305, in call answer, self.gateway_client, self.target_id, self.name) File "/usr/local/lib/python3.7/dist-packages/pyspark/sql/utils.py", line 128, in deco return f(*a, **kw) File "/usr/local/lib/python3.7/dist-packages/py4j/protocol.py", line 336, in get_return_value format(target_id, ".", name)) py4j.protocol.Py4JError: An error occurred while calling o1231.count

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/usr/local/lib/python3.7/dist-packages/nlu/pipe/pipeline.py", line 435, in predict data, stranger_features, output_datatype = DataConversionUtils.to_spark_df(data, self.spark, self.raw_text_column) TypeError: cannot unpack non-iterable NoneType object Exception occured Traceback (most recent call last): File "/usr/local/lib/python3.7/dist-packages/nlu/pipe/pipeline.py", line 438, in predict self.configure_light_pipe_usage(data.count(), multithread) File "/usr/local/lib/python3.7/dist-packages/pyspark/sql/dataframe.py", line 585, in count return int(self._jdf.count()) File "/usr/local/lib/python3.7/dist-packages/py4j/java_gateway.py", line 1305, in call answer, self.gateway_client, self.target_id, self.name) File "/usr/local/lib/python3.7/dist-packages/pyspark/sql/utils.py", line 128, in deco return f(*a, **kw) File "/usr/local/lib/python3.7/dist-packages/py4j/protocol.py", line 336, in get_return_value format(target_id, ".", name)) py4j.protocol.Py4JError: An error occurred while calling o1231.count

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/usr/local/lib/python3.7/dist-packages/nlu/pipe/pipeline.py", line 435, in predict data, stranger_features, output_datatype = DataConversionUtils.to_spark_df(data, self.spark, self.raw_text_column) TypeError: cannot unpack non-iterable NoneType object Exception occured Traceback (most recent call last): File "/usr/local/lib/python3.7/dist-packages/nlu/pipe/pipeline.py", line 438, in predict self.configure_light_pipe_usage(data.count(), multithread) File "/usr/local/lib/python3.7/dist-packages/pyspark/sql/dataframe.py", line 585, in count return int(self._jdf.count()) File "/usr/local/lib/python3.7/dist-packages/py4j/java_gateway.py", line 1305, in call answer, self.gateway_client, self.target_id, self.name) File "/usr/local/lib/python3.7/dist-packages/pyspark/sql/utils.py", line 128, in deco return f(*a, **kw) File "/usr/local/lib/python3.7/dist-packages/py4j/protocol.py", line 336, in get_return_value format(target_id, ".", name)) py4j.protocol.Py4JError: An error occurred while calling o1231.count

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/usr/local/lib/python3.7/dist-packages/nlu/pipe/pipeline.py", line 435, in predict data, stranger_features, output_datatype = DataConversionUtils.to_spark_df(data, self.spark, self.raw_text_column) TypeError: cannot unpack non-iterable NoneType object Exception occured Traceback (most recent call last): File "/usr/local/lib/python3.7/dist-packages/nlu/pipe/pipeline.py", line 438, in predict self.configure_light_pipe_usage(data.count(), multithread) File "/usr/local/lib/python3.7/dist-packages/pyspark/sql/dataframe.py", line 585, in count return int(self._jdf.count()) File "/usr/local/lib/python3.7/dist-packages/py4j/java_gateway.py", line 1305, in call answer, self.gateway_client, self.target_id, self.name) File "/usr/local/lib/python3.7/dist-packages/pyspark/sql/utils.py", line 128, in deco return f(*a, **kw) File "/usr/local/lib/python3.7/dist-packages/py4j/protocol.py", line 336, in get_return_value format(target_id, ".", name)) py4j.protocol.Py4JError: An error occurred while calling o1231.count

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/usr/local/lib/python3.7/dist-packages/nlu/pipe/pipeline.py", line 435, in predict data, stranger_features, output_datatype = DataConversionUtils.to_spark_df(data, self.spark, self.raw_text_column) TypeError: cannot unpack non-iterable NoneType object Exception occured Traceback (most recent call last): File "/usr/local/lib/python3.7/dist-packages/nlu/pipe/pipeline.py", line 438, in predict self.configure_light_pipe_usage(data.count(), multithread) File "/usr/local/lib/python3.7/dist-packages/pyspark/sql/dataframe.py", line 585, in count return int(self._jdf.count()) File "/usr/local/lib/python3.7/dist-packages/py4j/java_gateway.py", line 1305, in call answer, self.gateway_client, self.target_id, self.name) File "/usr/local/lib/python3.7/dist-packages/pyspark/sql/utils.py", line 128, in deco return f(*a, **kw) File "/usr/local/lib/python3.7/dist-packages/py4j/protocol.py", line 336, in get_return_value format(target_id, ".", name)) py4j.protocol.Py4JError: An error occurred while calling o1231.count

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/usr/local/lib/python3.7/dist-packages/nlu/pipe/pipeline.py", line 435, in predict data, stranger_features, output_datatype = DataConversionUtils.to_spark_df(data, self.spark, self.raw_text_column) TypeError: cannot unpack non-iterable NoneType object ERROR:nlu:Exception occured Traceback (most recent call last): File "/usr/local/lib/python3.7/dist-packages/nlu/pipe/pipeline.py", line 438, in predict self.configure_light_pipe_usage(data.count(), multithread) File "/usr/local/lib/python3.7/dist-packages/pyspark/sql/dataframe.py", line 585, in count return int(self._jdf.count()) File "/usr/local/lib/python3.7/dist-packages/py4j/java_gateway.py", line 1305, in call answer, self.gateway_client, self.target_id, self.name) File "/usr/local/lib/python3.7/dist-packages/pyspark/sql/utils.py", line 128, in deco return f(*a, **kw) File "/usr/local/lib/python3.7/dist-packages/py4j/protocol.py", line 336, in get_return_value format(target_id, ".", name)) py4j.protocol.Py4JError: An error occurred while calling o1231.count

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/usr/local/lib/python3.7/dist-packages/nlu/pipe/pipeline.py", line 435, in predict data, stranger_features, output_datatype = DataConversionUtils.to_spark_df(data, self.spark, self.raw_text_column) TypeError: cannot unpack non-iterable NoneType object No accepted Data type or usable columns found or applying the NLU models failed. Make sure that the first column you pass to .predict() is the one that nlu should predict on OR rename the column you want to predict on to 'text'
On try to reset restart Jupyter session and run the setup script again, you might have used too much memory Full Stacktrace was (<class 'TypeError'>, TypeError('cannot unpack non-iterable NoneType object'), <traceback object at 0x7f4ed5dd60f0>) Additional info: <class 'TypeError'> pipeline.py 435 cannot unpack non-iterable NoneType object Stuck? Contact us on Slack! https://join.slack.com/t/spark-nlp/shared_invite/zt-lutct9gm-kuUazcyFKhuGY3_0AMkxqA

I already tried 2-3 times. in my opinion, probably due to RAM exceeding. However, I already activated the GPU itself. Any solution for this? Thanks in advance.