InfluenceFunctional / MXtalTools

BSD 3-Clause "New" or "Revised" License
8 stars 1 forks source link

Add distance based cutoff for molecular duplicate screening #53

Open InfluenceFunctional opened 1 year ago

InfluenceFunctional commented 1 year ago

Currently filter only identical overlaps but should also probably do samples which are sufficiently close.

InfluenceFunctional commented 1 year ago

Could do similarity with current molecule embedding (Morgan 2, I believe)? or dot product between vector embeddings e.g., from a molecule or crystal (or both) encoder.

Could indeed replace entire dataset with sufficiently high-quality embeddings...