bpmunson / polygon

POLYGON VAE For de novo Polypharmacology
MIT License
28 stars 8 forks source link

FileNotFoundError: Missing fpscores.pkl.gz during scoring function initialization #3

Open marswh12312313 opened 3 months ago

marswh12312313 commented 3 months ago

Description:

When attempting to generate molecular structures using the polygon tool, I encountered a FileNotFoundError related to the missing fpscores.pkl.gz file. The following is the command I executed and the error traceback:

Command:

polygon generate \
  --model_path model_150.pt \
  --scoring_definition scoring_definition.csv \
  --max_len 100 \
  --n_epochs 200 \
  --mols_to_sample 8192 \
  --optimize_batch_size 512 \
  --optimize_n_epochs 2 \
  --keep_top 4096 \
  --opti gauss \
  --outF molecular_generation \
  --device cuda \
  --save_payloads \
  --n_jobs 4 \
  --debug

Error Traceback:

2024-07-01 10:10:14,756 [DEBUG   ] Making scoring function,
Traceback (most recent call last):
  File "/home/mars/miniforge3/envs/polyg/bin/polygon", line 8, in <module>
    sys.exit(main())
  File "/home/mars/miniforge3/envs/polyg/lib/python3.9/site-packages/polygon/run.py", line 841, in main
    generate_main(args)
  File "/home/mars/miniforge3/envs/polyg/lib/python3.9/site-packages/polygon/run.py", line 658, in generate_main
    scoring_function = build_scoring_function( 
  File "/home/mars/miniforge3/envs/polyg/lib/python3.9/site-packages/polygon/utils/utils.py", line 279, in build_scoring_function
    scorers[name] = SAScorer( 
  File "/home/mars/miniforge3/envs/polyg/lib/python3.9/site-packages/polygon/utils/custom_scoring_fcn.py", line 393, in __init__
    self.fscores = cPickle.load(gzip.open(fscores ))
  File "/home/mars/miniforge3/envs/polyg/lib/python3.9/gzip.py", line 58, in open
    binary_file = GzipFile(filename, gz_mode, compresslevel)
  File "/home/mars/miniforge3/envs/polyg/lib/python3.9/gzip.py", line 173, in __init__
    fileobj = self.myfileobj = builtins.open(filename, mode or 'rb')
FileNotFoundError: [Errno 2] No such file or directory: '/dataold/cellardata/users/bpmunson/projects/bk_drug/data/fpscores.pkl.gz'

Steps to Reproduce:

  1. Execute the command as specified above.
  2. Observe the FileNotFoundError.

Expected Behavior:

Actual Behavior:

Additional Information:

It seems the missing file is critical for the scoring function. Could you please provide guidance on how to obtain or generate this fpscores.pkl.gz file, or suggest a workaround to bypass this error?

Thank you for your assistance.

marswh12312313 commented 3 months ago

I found a potential source for the missing fpscores.pkl.gz file. It seems to be available for download from the RDKit repository at the following link:

RDKit fpscores.pkl.gz

Could you please confirm if this is the correct file to use? If so, it would be helpful to include this information in the documentation to avoid similar issues in the future.

Thank you!

munsonbp commented 3 months ago

Hello Maria,

Thank you for the issue report. Yes, that is the correct file to use.

As you mentioned, this is most certainly be included in the documentation. I will update it to provide more information.

All the best, Brenton

On Sun, Jun 30, 2024 at 7:28 PM Maria @.***> wrote:

Comment:

I found a potential source for the missing fpscores.pkl.gz file. It seems to be available for download from the RDKit repository at the following link:

RDKit fpscores.pkl.gz https://github.com/rdkit/rdkit/blob/master/Contrib/SA_Score/fpscores.pkl.gz

Could you please confirm if this is the correct file to use? If so, it would be helpful to include this information in the documentation to avoid similar issues in the future.

Thank you!

— Reply to this email directly, view it on GitHub https://github.com/bpmunson/polygon/issues/3#issuecomment-2199007891, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA37E57FINFMBJZUDCTPFTLZKC5GFAVCNFSM6AAAAABKEUTES6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCOJZGAYDOOBZGE . You are receiving this because you are subscribed to this thread.Message ID: @.***>

DM0815 commented 1 month ago

Hello Maria, Thank you for the issue report. Yes, that is the correct file to use. As you mentioned, this is most certainly be included in the documentation. I will update it to provide more information. All the best, Brenton On Sun, Jun 30, 2024 at 7:28 PM Maria @.> wrote: Comment: I found a potential source for the missing fpscores.pkl.gz file. It seems to be available for download from the RDKit repository at the following link: RDKit fpscores.pkl.gz https://github.com/rdkit/rdkit/blob/master/Contrib/SA_Score/fpscores.pkl.gz Could you please confirm if this is the correct file to use? If so, it would be helpful to include this information in the documentation to avoid similar issues in the future. Thank you! — Reply to this email directly, view it on GitHub <#3 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA37E57FINFMBJZUDCTPFTLZKC5GFAVCNFSM6AAAAABKEUTES6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCOJZGAYDOOBZGE . You are receiving this because you are subscribed to this thread.Message ID: @.>

Excuse me, I met the same question. What I'd like to know is the address where this file "RDKit fpscores.pkl.gz" should go. Thank you.