colabfit / data-lake

A repository to request ingestion of datasets to ColabFit
https://colabfit.org/
0 stars 1 forks source link

[Dataset submission | request] **GDB-[11, 13, 17] version 2** #45

Closed gpwolfe closed 1 month ago

gpwolfe commented 8 months ago

Name

Gregory Wolfe

Email

gw2338@nyu.edu

Dataset name

GDB_Databases_v2

Authors

Tobias Fink, Lorenz C. Blum, Lars Ruddigkeit, Ruud van Deursen, Jean-Louis Reymond

Publication link

https://doi.org/10.1021/ci300415d

Data link

https://doi.org/10.5281/zenodo.7041051

Additional links

GDB-13: https://doi.org/10.1021/ja902302h, GDB-11: https://doi.org/10.1002/anie.200462457, https://doi.org/10.1021/ci600423u. GDB-17: https://doi.org/10.1021/ci300415d

Dataset description

GDB-11 enumerates small organic molecules up to 11 atoms of C, N, O and F following simple chemical stability and synthetic feasibility rules. GDB-13 enumerates small organic molecules up to 13 atoms of C, N, O, S and Cl following simple chemical stability and synthetic feasibility rules. With 977 468 314 structures, GDB-13 is the largest publicly available small organic molecule database to date.

File details

.smi.gz or .tgz compressed files. Entire datasets are available as single files, along with smaller subsets

Method

No response

Method (other)

No response

Software

None

Software (other)

No response

Software version(s)

No response

Additional details

No response

Property types

No response

Other/additional property

No response

Property details

No response

Elements

No response

Number of Configurations

No response

Naming convention

No response

Configuration sets

No response

Configuration labels

No response

Distribution license

https://creativecommons.org/licenses/by/4.0

Permissions

gpwolfe commented 3 months ago

These appear to be only SMILES files, with no properties associated. If this is incorrect, please note, otherwise closing