datamol-io / datamol

Molecular Processing Made Easy.
https://docs.datamol.io
Apache License 2.0
462 stars 48 forks source link

`dm.read_smi()` copies a file to an already existing location #150

Closed cwognum closed 1 year ago

cwognum commented 1 year ago

When I try to load a remote .smi file using Datamol, I run into the following exception:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
(...)

File ~/mambaforge/envs/mood/lib/python3.10/site-packages/datamol/io.py:320, in read_smi(urlpath)
    318 if not fsspec.utils.can_be_local(str(urlpath)):
    319     active_path = pathlib.Path(tempfile.mkstemp()[1])
--> 320     dm.utils.fs.copy_file(urlpath, active_path)
    322 # Read the molecules
    323 supplier = rdmolfiles.SmilesMolSupplier(str(active_path), titleLine=0)

File ~/mambaforge/envs/mood/lib/python3.10/site-packages/datamol/utils/fs.py:250, in copy_file(source, destination, chunk_size, force, progress, leave_progress)
    247     raise ValueError(f"The file being copied does not exist or is not a file: {source}")
    249 if not force and is_file(destination_file):  # type: ignore
--> 250     raise ValueError(f"The destination file to copy already exists: {destination}")
    252 with source_file as source_stream:
    253     with destination_file as destination_stream:

ValueError: The destination file to copy already exists: /tmp/tmpkzrm7u58

Seems like this function is not up-to-date with recent changes in the dm.utils.fs module? Seems to me like active_path should be set differently to a non-existing file.

hadim commented 1 year ago

Thanks. Using force=True should do the job here.

In general, I don't recommend using this kind of reader/saver and instead use the CSV reader/saver which should work with any .smi files.

cwognum commented 1 year ago

I'm not sure I understand. You're saying .smi files can also be read (or actually; should be read) using dm.read_csv()?

hadim commented 1 year ago

Yes. There is nothing specific to .smi. It's just a csv-like format.

cwognum commented 1 year ago

Does Datamol implement these read functions separately? Seems like 'read_smi()' should then just be a light wrapper of 'read_csv()'?

hadim commented 1 year ago

Yes I remember hesitating doing that at the very beginning of datamol but I prefer to stick to rdkit here in case the rdkit function is modified. If a user use read_smi he will expect the equivalent rdkit function under the hood.

hadim commented 1 year ago

Thanks for reporting the bug @cwognum !