dmis-lab / bern

A neural named entity recognition and multi-type normalization tool for biomedical text mining
https://bern.korea.ac.kr
BSD 2-Clause "Simplified" License
171 stars 44 forks source link

NER extraction of text seems not to be working #17

Open amalic opened 4 years ago

amalic commented 4 years ago

Sample program, based on your README.MD

import requests
import json
body_data = {"param": json.dumps({"text": "CLAPO syndrome: identification of somatic activating PIK3CA mutations and delineation of the natural history and phenotype. PURPOSE: CLAPO syndrome is a rare vascular disorder characterized by capillary malformation of the lower lip, lymphatic malformation predominant on the face and neck, asymmetry, and partial/generalized overgrowth. Here we tested the hypothesis that, although the genetic cause is not known, the tissue distribution of the clinical manifestations in CLAPO seems to follow a pattern of somatic mosaicism. METHODS: We clinically evaluated a cohort of 13 patients with CLAPO and screened 20 DNA blood/tissue samples from 9 patients using high-throughput, deep sequencing. RESULTS: We identified five activating mutations in the PIK3CA gene in affected tissues from 6 of the 9 patients studied; one of the variants (NM_006218.2:c.248T>C; p.Phe83Ser) has not been previously described in developmental disorders. CONCLUSION: We describe for the first time the presence of somatic activating PIK3CA mutations in patients with CLAPO. We also report an update of the phenotype and natural history of the syndrome."})}
response = requests.post('http://localhost/', data=body_data)
print(response)
print("content: ", response.content)
result_dict = response.json()
print(result_dict)

Output

<Response [200]>
content:  b''
Traceback (most recent call last):
  File "test.py", line 7, in <module>
    result_dict = response.json()
  File "/home/alex/.local/lib/python3.6/site-packages/requests/models.py", line 898, in json
    return complexjson.loads(self.text, **kwargs)
  File "/usr/lib/python3/dist-packages/simplejson/__init__.py", line 518, in loads
    return _default_decoder.decode(s)
  File "/usr/lib/python3/dist-packages/simplejson/decoder.py", line 370, in decode
    obj, end = self.raw_decode(s)
  File "/usr/lib/python3/dist-packages/simplejson/decoder.py", line 400, in raw_decode
    return self.scan_once(s, idx=_w(s, idx).end())
simplejson.errors.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

A curl example would be highly appreciated.

donghyeonk commented 4 years ago

Are you using port number 80? If not, add the port number you set after "localhost" and a colon ":".

tomasonjo commented 4 years ago

I run into the same issue. The server issues the following error:

89.212.10xx - - [23/Apr/2020 18:42:08] "POST / HTTP/1.1" 200 - [23/Apr/2020 18:42:08.609364] [Thread-95] text_hash: 3da0b63ecd8efcc2a76bbe02df6fc42b9003c94c0662a9223396c2f7 /3da0b63ecd8efcc2a76bbe02df6fc42b9003c94c0662a9223396c2f7.PubTator - (PubTator format) : Processing Time:0.239sec [23/Apr/2020 18:42:08.850149] [Thread-95] GNormPlus 0.240 sec input/3da0b63ecd8efcc2a76bbe02df6fc42b9003c94c0662a9223396c2f7.PubTator - (PubTator format) : Processing Time:0.161sec [23/Apr/2020 18:42:09.012860] [Thread-95] tmVar 2.0 0.162 sec

Exception happened during processing of request from ('89.212.10.xx', 54130) Traceback (most recent call last): File "/usr/lib/python3.5/socketserver.py", line 625, in process_request_thread self.finish_request(request, client_address) File "/usr/lib/python3.5/socketserver.py", line 354, in finish_request self.RequestHandlerClass(request, client_address, self) File "/usr/lib/python3.5/socketserver.py", line 681, in init self.handle() File "/usr/lib/python3.5/http/server.py", line 422, in handle self.handle_one_request() File "/usr/lib/python3.5/http/server.py", line 410, in handle_one_request method() File "server.py", line 317, in do_POST text, cur_thread_name, is_raw_text=True, reuse=False)

File "server.py", line 452, in tag_entities self.biobert_recognize(dict_list, is_raw_text, cur_thread_name) File "server.py", line 490, in biobert_recognize thread_id=cur_thread_name) File "/app/biobert_ner/utils.py", line 15, in with_profiling ret = fn(*args, **kwargs) File "/app/biobert_ner/run_ner.py", line 488, in recognize with open(token_path, 'r') as reader: FileNotFoundError: [Errno 2] No such file or directory: 'biobert_ner/tmp/token_test_Thread-95.txt'

amalic commented 4 years ago

Are you using port number 80? If not, add the port number you set after "localhost" and a colon ":".

Yes I am using port 80. I am running Bern in a Docker container.

see: https://github.com/amalic/bern-docker

tomasonjo commented 4 years ago

I tried it without the docker as well, and the error persists... after digging a bit I found out that the output folder contains valid JSON results stored like:

bern_demo_095b8bb35ae644040374c488a9ca7c7b5ec56dc66fb577ff227c01e5.json

The problem is just that it does not return this JSON unfortunately

amalic commented 4 years ago

Just for clarification. The call for PubMed-IDs works, except for newer PMIDs. This means that I don't have any issues with the port.

The example from your readme file for recognizing entities from text does not work. Can't you reproduce the issue?

zahidmughal commented 4 years ago

Receiving empty response with same body_data which you used to hit bern server.

Following is the body_data:

body_data = {"param": json.dumps({"text": "CLAPO syndrome: identification of somatic activating PIK3CA mutations and delineation of the natural history and phenotype. PURPOSE: CLAPO syndrome is a rare vascular disorder characterized by capillary malformation of the lower lip, lymphatic malformation predominant on the face and neck, asymmetry, and partial/generalized overgrowth. Here we tested the hypothesis that, although the genetic cause is not known, the tissue distribution of the clinical manifestations in CLAPO seems to follow a pattern of somatic mosaicism. METHODS: We clinically evaluated a cohort of 13 patients with CLAPO and screened 20 DNA blood/tissue samples from 9 patients using high-throughput, deep sequencing. RESULTS: We identified five activating mutations in the PIK3CA gene in affected tissues from 6 of the 9 patients studied; one of the variants (NM_006218.2:c.248T>C; p.Phe83Ser) has not been previously described in developmental disorders. CONCLUSION: We describe for the first time the presence of somatic activating PIK3CA mutations in patients with CLAPO. We also report an update of the phenotype and natural history of the syndrome."})}

Bern Server running IP: [04/May/2020 09:55:28.162347] Starting server at http://0.0.0.0:8888

After running in python:

response = requests.post('http://0.0.0.0:8888', data=body_data) response.text

Result: ''

No json response received after hitting it.

Meanwhile I tried by passing wrong body_data to test if it return any error or not. Following are the errors I'm able to receive:

{"error": "empty text"} {"error": "only whitespace letters"} {"error": "no param"} etc.

Please help in receiving json response for valid request, which is empty in my case.

zahidmughal commented 4 years ago

Receiving empty response with same body_data which you used to hit bern server.

Following is the body_data:

body_data = {"param": json.dumps({"text": "CLAPO syndrome: identification of somatic activating PIK3CA mutations and delineation of the natural history and phenotype. PURPOSE: CLAPO syndrome is a rare vascular disorder characterized by capillary malformation of the lower lip, lymphatic malformation predominant on the face and neck, asymmetry, and partial/generalized overgrowth. Here we tested the hypothesis that, although the genetic cause is not known, the tissue distribution of the clinical manifestations in CLAPO seems to follow a pattern of somatic mosaicism. METHODS: We clinically evaluated a cohort of 13 patients with CLAPO and screened 20 DNA blood/tissue samples from 9 patients using high-throughput, deep sequencing. RESULTS: We identified five activating mutations in the PIK3CA gene in affected tissues from 6 of the 9 patients studied; one of the variants (NM_006218.2:c.248T>C; p.Phe83Ser) has not been previously described in developmental disorders. CONCLUSION: We describe for the first time the presence of somatic activating PIK3CA mutations in patients with CLAPO. We also report an update of the phenotype and natural history of the syndrome."})}

Bern Server running IP: [04/May/2020 09:55:28.162347] Starting server at http://0.0.0.0:8888

After running in python:

response = requests.post('http://0.0.0.0:8888', data=body_data) response.text

Result: ''

No json response received after hitting it.

Meanwhile I tried by passing wrong body_data to test if it return any error or not. Following are the errors I'm able to receive:

{"error": "empty text"} {"error": "only whitespace letters"} {"error": "no param"} etc.

Please help in receiving json response for valid request, which is empty in my case.


127.0.0.1 - - [04/May/2020 15:34:31] "POST / HTTP/1.1" 200 - [04/May/2020 15:34:31.333152] [Thread-24] text_hash: 3da0b63ecd8efcc2a76bbe02df6fc42b9003c94c0662a9223396c2f7 [04/May/2020 15:34:31.333375] [Thread-24] GNormPlus 0.000 sec

Exception happened during processing of request from ('127.0.0.1', 42856) Traceback (most recent call last): File "/anaconda/envs/py37_default/lib/python3.7/shutil.py", line 566, in move os.rename(src, real_dst) FileNotFoundError: [Errno 2] No such file or directory: '/home/vm-admin/bern/GNormPlusJava/output/3da0b63ecd8efcc2a76bbe02df6fc42b9003c94c0662a9223396c2f7.PubTator' -> '/home/vm-admin/bern/tmVarJava/input/3da0b63ecd8efcc2a76bbe02df6fc42b9003c94c0662a9223396c2f7.PubTator'

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/anaconda/envs/py37_default/lib/python3.7/socketserver.py", line 650, in process_request_thread self.finish_request(request, client_address) File "/anaconda/envs/py37_default/lib/python3.7/socketserver.py", line 360, in finish_request self.RequestHandlerClass(request, client_address, self) File "/anaconda/envs/py37_default/lib/python3.7/socketserver.py", line 720, in init self.handle() File "/anaconda/envs/py37_default/lib/python3.7/http/server.py", line 426, in handle self.handle_one_request() File "/anaconda/envs/py37_default/lib/python3.7/http/server.py", line 414, in handle_one_request method() File "server.py", line 317, in do_POST text, cur_thread_name, is_raw_text=True, reuse=False) File "server.py", line 423, in tag_entities shutil.move(output_gnormplus, input_tmvar2) File "/anaconda/envs/py37_default/lib/python3.7/shutil.py", line 580, in move copy_function(src, real_dst) File "/anaconda/envs/py37_default/lib/python3.7/shutil.py", line 266, in copy2 copyfile(src, dst, follow_symlinks=follow_symlinks) File "/anaconda/envs/py37_default/lib/python3.7/shutil.py", line 120, in copyfile with open(src, 'rb') as fsrc: FileNotFoundError: [Errno 2] No such file or directory: '/home/vm-admin/bern/GNormPlusJava/output/3da0b63ecd8efcc2a76bbe02df6fc42b9003c94c0662a9223396c2f7.PubTator'

I'm receiving following error in logs.

nosiam commented 3 years ago

If I call POST API 2 times with the same text it fails with below error. Seems this is linked to the deletion of tmp file from GNormPlusJava. But if I set "DeleteTmp = False" in the "setup.txt" of GNormPlusJava and restart the service with the setup.txt it doen't solve the issue and tmp files are still deleted.

nohup_BERT.out :

Traceback (most recent call last): File "/usr/lib/python3.5/socketserver.py", line 625, in process_request_thread self.finish_request(request, client_address) File "/usr/lib/python3.5/socketserver.py", line 354, in finish_request self.RequestHandlerClass(request, client_address, self) File "/usr/lib/python3.5/socketserver.py", line 681, in init self.handle() File "/usr/lib/python3.5/http/server.py", line 422, in handle self.handle_one_request() File "/usr/lib/python3.5/http/server.py", line 410, in handle_one_request method() File "server.py", line 317, in do_POST text, cur_thread_name, is_raw_text=True, reuse=False) File "server.py", line 423, in tag_entities shutil.move(output_gnormplus, input_tmvar2) File "/usr/lib/python3.5/shutil.py", line 552, in move copy_function(src, real_dst) File "/usr/lib/python3.5/shutil.py", line 251, in copy2 copyfile(src, dst, follow_symlinks=follow_symlinks) File "/usr/lib/python3.5/shutil.py", line 114, in copyfile with open(src, 'rb') as fsrc: FileNotFoundError: [Errno 2] No such file or directory: '/root/bern/GNormPlusJava/output/3da0b63ecd8efcc2a76bbe02df6fc42b9003c94c0662a9223396c2f7.PubTator'

nohup.out from GnormPlusJava :

Starting GNormPlus Service at 172.17.0.2:18895 Loading Gene Dictionary : Processing Time:8.459sec Ready /693c63dd1b77aa3f29f02c2bb2ef000b0ae1f6846f1d8bb46497dfb2.PubTator - (PubTator format) : Processing Time:5.734sec java.io.FileNotFoundException: tmp/693c63dd1b77aa3f29f02c2bb2ef000b0ae1f6846f1d8bb46497dfb2.PubTator (No such file or directory) at java.io.FileOutputStream.open0(Native Method) at java.io.FileOutputStream.open(FileOutputStream.java:270) at java.io.FileOutputStream.(FileOutputStream.java:213) at java.io.FileOutputStream.(FileOutputStream.java:101) at GNormPluslib.BioCDoc.PubTator2BioC(BioCDoc.java:124) at kr.ac.korea.dmis.GNormPlus.tag(GNormPlus.java:316) at kr.ac.korea.dmis.GNPServer.run(GNPServer.java:42) at kr.ac.korea.dmis.GNPServer.(GNPServer.java:30) at kr.ac.korea.dmis.GNPServer.main(GNPServer.java:72)

ting830812 commented 3 years ago

I solved these errors by reinstalling CRF in GNormPlusJava and tmVar2Java.

In my situation, this error,

FileNotFoundError: [Errno 2] No such file or directory: 'biobert_ner/tmp/token_test_Thread-{thread_id}.txt'

, was caused by tmVar2. It produced an empty file in tmVarJava/output so that the tokens file can't not be produced correctly.

And this error,

FileNotFoundError: [Errno 2] No such file or directory: '~/bern/GNormPlusJava/output/{text_hash_id}.PubTator'

, was caused by GNormPlus. Like the previous error, GNormPlus didn't generate correct output file. You might find some error message asking you to reinstall CRF in ~/bern/logs/nohup_gnormplus.out.