CDDLeiden / DrugEx

De Novo Drug Design with RNNs and Transformers
https://cddleiden.github.io/DrugEx/docs/
MIT License
119 stars 16 forks source link

Issue with the 'drugex download' command #8

Closed Nickspizza001 closed 6 months ago

Nickspizza001 commented 6 months ago

I tried this on my command line and Google Colab, but Neither worked. The download started but failed. Thank you for your help. image

martin-sicho commented 6 months ago

Hi, it might just be a temporary connection issue. Did you try again later? It is strange the PT models worked and the qsar did not since the download mechanics should be the same. I just tried on my Linux desktop and it worked. So I would suggest to try again and let us know if you still have issues.

Nickspizza001 commented 6 months ago

Thank you, Martin. This was my guess at first. But after trying several times and no response, I had to reach out ASAP. Also, I am using a Windows desktop. I don't think this should affect. I will try again, Martin. Thanks!

martin-sicho commented 6 months ago

Indeed, I also do not think that Windows is an issue here. Alternatively, you could just download the model directly from the link and place the file in data/models/qsar/ and unpack it there. The script should not attempt to download it again and the rest should proceed normally.

Nickspizza001 commented 6 months ago

Hello Martin, Thank you once again. I was able to download it via the link you sent. However, the list of dir in my data/data is just showing only the .payprus folder alone. How can I get access to these files listed in the data/data in the tutorial? as shown in the image below? image

martin-sicho commented 6 months ago

The models should be unpacked under ./data/models, not ./data/data (this gets populated later after the model download). Do you have that directory? It seems downloading the pretrained models did work for you so you should at least have those.

Nickspizza001 commented 6 months ago

Hello Martin, Yes I unpacked the model under ./data/models. I think I am good to go in the ./data/models directory. My major problem now is my ./data/data is empty. And this is important in the sequence-RNN.ipynb tutorial. I don't how to get access to the sample files used in the tutorial.

martin-sicho commented 6 months ago

After you have extracted the models into ./data/models, does the download script still crash? It should not attempt to download the models anymore and just go straight to downloading data. Either way, you can also just download and modify the script directly and run it yourself with the QSAR model part commented out. FYI this code downloads the tutorial data:

 # Download data files
  logger.info("Downloading data files from Papyrus database.")
  acc_keys = ["P29274"]  # Adenosine receptor A2A (https://www.uniprot.org/uniprotkb/P29274/entry)
  dataset_name = "A2AR_LIGANDS"  # name of the file to be generated
  quality = "high"  # choose minimum quality from {"high", "medium", "low"}
  papyrus_version = '05.6'  # Papyrus database version

  papyrus = Papyrus(
      data_dir=os.path.join(args.out_dir, 'data', '.Papyrus'),
      stereo=False,
      version=papyrus_version,
      descriptors=None,
      plus_only=True

  )

  datasets_dir = os.path.join(args.out_dir, 'data')
  os.makedirs(datasets_dir, exist_ok=True)
  dataset = papyrus.getData(
      dataset_name,
      acc_keys,
      quality,
      output_dir=datasets_dir,
      use_existing=True
  )

  print(f"Tutorial data for accession keys '{acc_keys}' was loaded. Molecules in total: {len(dataset.getDF())}")

Maybe you can also just run that and be OK. It is really strange that you are getting this download error for the QSAR model. I still cannot reproduce it anywhere I have DrugEx installed, but I am not using MINGW64 for any of it, which might be the issue. You could see if in WSL it works better.

Nickspizza001 commented 6 months ago

Thank you very much @martin-sicho. It works perfectly! Running the script alone fixed the issue as you suggested. Thank you!