alistairewj / bert-deid

deidentify patient notes using pre-trained BERT
12 stars 2 forks source link

Google Authentication Credentials Issue when bert_deid download #20

Open julianmricci opened 1 year ago

julianmricci commented 1 year ago

(deid) julianricci@Julians-MacBook-Pro-673 bert-deid-master % bert_deid download

06/20/2023 15:31:34 - INFO - bert_deid.download - Beginning download of model files to bert_deid_model 06/20/2023 15:31:34 - INFO - bert_deid.download - Downloading bert-deid/bert-i2b2-2014/added_tokens.json to bert_deid_model/added_tokens.json 06/20/2023 15:31:37 - WARNING - google.auth.compute_engine._metadata - Compute Engine Metadata server unavailable on attempt 1 of 3. Reason: timed out 06/20/2023 15:31:40 - WARNING - google.auth.compute_engine._metadata - Compute Engine Metadata server unavailable on attempt 2 of 3. Reason: timed out 06/20/2023 15:31:40 - WARNING - google.auth.compute_engine._metadata - Compute Engine Metadata server unavailable on attempt 3 of 3. Reason: [Errno 64] Host is down 06/20/2023 15:31:40 - WARNING - google.auth._default - Authentication failed using Compute Engine authentication due to unavailable metadata server. Traceback (most recent call last): File "/Users/julianricci/anaconda3/envs/deid/bin/bert_deid", line 8, in sys.exit(main()) File "/Users/julianricci/anaconda3/envs/deid/lib/python3.7/site-packages/bert_deid/main.py", line 131, in main download(args) File "/Users/julianricci/anaconda3/envs/deid/lib/python3.7/site-packages/bert_deid/main.py", line 121, in download download_model(args.model_dir) File "/Users/julianricci/anaconda3/envs/deid/lib/python3.7/site-packages/bert_deid/download.py", line 41, in download_model download_blob(bucket_name, f'bert-i2b2-2014/{fn}', f'{model_dir}/{fn}') File "/Users/julianricci/anaconda3/envs/deid/lib/python3.7/site-packages/bert_deid/download.py", line 16, in download_blob storage_client = storage.Client() File "/Users/julianricci/anaconda3/envs/deid/lib/python3.7/site-packages/google/cloud/storage/client.py", line 119, in init _http=_http, File "/Users/julianricci/anaconda3/envs/deid/lib/python3.7/site-packages/google/cloud/client.py", line 318, in init _ClientProjectMixin.init(self, project=project, credentials=credentials) File "/Users/julianricci/anaconda3/envs/deid/lib/python3.7/site-packages/google/cloud/client.py", line 266, in init project = self._determine_default(project) File "/Users/julianricci/anaconda3/envs/deid/lib/python3.7/site-packages/google/cloud/client.py", line 285, in _determine_default return _determine_default_project(project) File "/Users/julianricci/anaconda3/envs/deid/lib/python3.7/site-packages/google/cloud/_helpers.py", line 186, in _determine_defaultproject , project = google.auth.default() File "/Users/julianricci/anaconda3/envs/deid/lib/python3.7/site-packages/google/auth/_default.py", line 488, in default raise exceptions.DefaultCredentialsError(_HELP_MESSAGE) google.auth.exceptions.DefaultCredentialsError: Could not automatically determine credentials. Please set GOOGLE_APPLICATION_CREDENTIALS or explicitly create credentials and re-run the application. For more information, please see https://cloud.google.com/docs/authentication/getting-started

How would I resolve this?

wayneisaacuy commented 10 months ago

(deid) julianricci@Julians-MacBook-Pro-673 bert-deid-master % bert_deid download

06/20/2023 15:31:34 - INFO - bert_deid.download - Beginning download of model files to bert_deid_model 06/20/2023 15:31:34 - INFO - bert_deid.download - Downloading bert-deid/bert-i2b2-2014/added_tokens.json to bert_deid_model/added_tokens.json 06/20/2023 15:31:37 - WARNING - google.auth.compute_engine._metadata - Compute Engine Metadata server unavailable on attempt 1 of 3. Reason: timed out 06/20/2023 15:31:40 - WARNING - google.auth.compute_engine._metadata - Compute Engine Metadata server unavailable on attempt 2 of 3. Reason: timed out 06/20/2023 15:31:40 - WARNING - google.auth.compute_engine._metadata - Compute Engine Metadata server unavailable on attempt 3 of 3. Reason: [Errno 64] Host is down 06/20/2023 15:31:40 - WARNING - google.auth._default - Authentication failed using Compute Engine authentication due to unavailable metadata server. Traceback (most recent call last): File "/Users/julianricci/anaconda3/envs/deid/bin/bert_deid", line 8, in sys.exit(main()) File "/Users/julianricci/anaconda3/envs/deid/lib/python3.7/site-packages/bert_deid/main.py", line 131, in main download(args) File "/Users/julianricci/anaconda3/envs/deid/lib/python3.7/site-packages/bert_deid/main.py", line 121, in download download_model(args.model_dir) File "/Users/julianricci/anaconda3/envs/deid/lib/python3.7/site-packages/bert_deid/download.py", line 41, in download_model download_blob(bucket_name, f'bert-i2b2-2014/{fn}', f'{model_dir}/{fn}') File "/Users/julianricci/anaconda3/envs/deid/lib/python3.7/site-packages/bert_deid/download.py", line 16, in download_blob storage_client = storage.Client() File "/Users/julianricci/anaconda3/envs/deid/lib/python3.7/site-packages/google/cloud/storage/client.py", line 119, in init _http=_http, File "/Users/julianricci/anaconda3/envs/deid/lib/python3.7/site-packages/google/cloud/client.py", line 318, in init _ClientProjectMixin.init(self, project=project, credentials=credentials) File "/Users/julianricci/anaconda3/envs/deid/lib/python3.7/site-packages/google/cloud/client.py", line 266, in init project = self._determine_default(project) File "/Users/julianricci/anaconda3/envs/deid/lib/python3.7/site-packages/google/cloud/client.py", line 285, in _determine_default return _determine_default_project(project) File "/Users/julianricci/anaconda3/envs/deid/lib/python3.7/site-packages/google/cloud/_helpers.py", line 186, in _determine_defaultproject , project = google.auth.default() File "/Users/julianricci/anaconda3/envs/deid/lib/python3.7/site-packages/google/auth/_default.py", line 488, in default raise exceptions.DefaultCredentialsError(_HELP_MESSAGE) google.auth.exceptions.DefaultCredentialsError: Could not automatically determine credentials. Please set GOOGLE_APPLICATION_CREDENTIALS or explicitly create credentials and re-run the application. For more information, please see https://cloud.google.com/docs/authentication/getting-started

How would I resolve this?

Hi, I'm facing the same problem. Were you able to resolve it?

landiisotta commented 10 months ago

Looks like bert-deid and other models have been uploaded to PhysioNet https://www.physionet.org/content/transformer-deid/1.0.0/ and can also be found on huggingface https://huggingface.co/KindLab

wayneisaacuy commented 10 months ago

Thanks for the heads up! I have seen those but were you able to make it run? The files that have to be downloaded for this code to work are specified here: https://github.com/alistairewj/bert-deid/blob/master/bert_deid/download.py. They are files = [ 'added_tokens.json', 'config.json', 'label_set.bin', 'pytorch_model.bin', 'special_tokens_map.json', 'tokenizer_config.json', 'training_args.bin', 'vocab.txt' ]

I still have to check the code if all of the files are needed.

I also saw the github code from KindLab but are you aware if a demo notebook exists on how to use the pre-trained model? The documentation in https://github.com/kind-lab/transformer-deid only talks about how to train the model. The link for evaluation doesn't work.

I'm currently using another de-identification model but it'd be nice to compare. Thanks!

landiisotta commented 9 months ago

@wayneisaacuy I was able to run the model in inference on one example using the function deid_example in the module predict. Nevertheless, the performance needs to be improved so it looks like it needs further fine-tuning anyway. Another option is the Perl-based de-identification software package by Neamatullah et al. 2008 (https://bmcmedinformdecismak.biomedcentral.com/articles/10.1186/1472-6947-8-32), that was used to de-identify the MIMIC dataset. It is a rule-based approach that works quite well as is, but needs rules to be added to adapt it to your corpus. It is publicly available through the Physionet repository: (1) https://www.physionet.org/content/deid/1.1/ ; (2) https://www.physionet.org/content/deidentifiedmedicaltext/1.0/

LorenaGarcia-Foncillas commented 2 months ago

Hi, I'm encountering a similar error: raise exceptions.DefaultCredentialsError(_CLOUD_SDK_MISSING_CREDENTIALS) google.auth.exceptions.DefaultCredentialsError: Your default credentials were not found. To set up Application Default Credentials, see https://cloud.google.com/docs/authentication/external/set-up-adc for more information.

Has anyone managed to download the model?