daenuprobst / molzip

The gzip classification method implemented for molecule classification.
MIT License
54 stars 9 forks source link

Paper writing collaboration #5

Open DavidLandup0 opened 1 year ago

DavidLandup0 commented 1 year ago

Hey! Could we perhaps collaborate on turning this into a small paper? :)

daenuprobst commented 1 year ago

Sure, let me know what you want to do... I'm currently preparing a draft for a working paper.

DavidLandup0 commented 1 year ago

Awesome, thanks! What do you think would be the areas that currently need to be fleshed out more?

I figure that one of the most important things to tweak/tune here is the representation produced by SMILES+gzip. Testing out augmentations, canonicalization schemes, etc. or simply doing analysis on the produced representations may lead to improving metrics.

Besides that, a decent analysis section of the representations would be a nice addition, IMO.