UnixJunkie / FASMIFRA

Molecular Generation by Fast Assembly of SMILES Fragments
GNU General Public License v3.0
50 stars 8 forks source link

count uniq fragments in a set of fragmented molecules #23

Closed UnixJunkie closed 4 months ago

UnixJunkie commented 4 months ago

for each fragment:

UnixJunkie commented 4 months ago

number of uniq ones, but also how many times each one was seen

UnixJunkie commented 4 months ago

maybe, a one time pass on a set of fragmented molecules using an rdkit python script can do the job; it will create a dictionary of frag_smi to cano_frag_smi plus cano_frag_smi to unique id

UnixJunkie commented 4 months ago

or, maybe a special option in fasmifra to spit out all the encountered fragments in a file; then we'll process them w/ a Python rdkit script for canonicalization and counting

UnixJunkie commented 4 months ago

fasmifra exe now has the -of option; we need the rdkit python canonicalization script

UnixJunkie commented 4 months ago

there is the bin/fasmifra_frag_dict.py script now, to postprocess the output of fasmifra's -of new option