I have been pointed by a reviewer of our nano-lazar paper to the Paca2 NM cell uptake dataset (109 nanoparticles with common core). It has been used by a couple of authors, but the training data usually comes as Word/PDF document from non-open access journals. Singh and Gupta (http://pubs.rsc.org/-/content/articlehtml/2014/ra/c4ra01274g) have used it together with four additional interesting datasets, they have also core and coating SMILES structures (e.g. for fullerenes) in the supplementary material (again als PDF). I am not sure if an inclusion of these datasets into eNM is realistic (licensing issues, scraping/annotating data from PDF/Word, ...), but it would be of course attractive from a modellers perspective.
I have been pointed by a reviewer of our nano-lazar paper to the Paca2 NM cell uptake dataset (109 nanoparticles with common core). It has been used by a couple of authors, but the training data usually comes as Word/PDF document from non-open access journals. Singh and Gupta (http://pubs.rsc.org/-/content/articlehtml/2014/ra/c4ra01274g) have used it together with four additional interesting datasets, they have also core and coating SMILES structures (e.g. for fullerenes) in the supplementary material (again als PDF). I am not sure if an inclusion of these datasets into eNM is realistic (licensing issues, scraping/annotating data from PDF/Word, ...), but it would be of course attractive from a modellers perspective.