crcollins / molml

A library to interface molecules and machine learning.
MIT License
65 stars 17 forks source link

get_labels() is incorrect for LocalEncodedAngle if form is changed #3

Closed Anjum48 closed 5 years ago

Anjum48 commented 5 years ago

Hi, thanks for a great package!

I'm having an issue with LocalEncodedAngle. If I run it with the default parameters, and then call get_labels() the correct number of labels are returned and I am able to make a DataFrame using these labels as the column names

However, if I change form=1, the same number of labels are returned as if form was set to the default value of 2.

As a workaround, I'm using get_encoded_labels(["C", "F", "H", "N", "O"]) which I think is fine for what I need it for right now but I don't think this workaround works if form=0

crcollins commented 5 years ago

Haha, yes this is correct. I noticed this issue a few days ago and there is a fix in the repo (1b47adb0).

The work around that you mention should work (given that that is the correct order that is selected). For form=0, there should just be a single segment range, so any label should be fine (get_encoded_labels(['BASE'])). With the fix in the repo, it uses an empty string (for better or worse) right now.

I have not rolled a new release yet because there is still a nagging issue with form. It is more of a research issue rather than a technical one. Basically, the performance of the form reduction is dependent on the indices selected for that reduction. The best programatic fix I have found thus far is to sum over all combinations, but I have not fully vetted the performance. Nor have I decided how it would best modify the current API.

Hopefully, I can get some time this week to actually resolve these issues and push out a new version with these additional fixes.

Thanks again!

Anjum48 commented 5 years ago

Great thank you very much! I'll check out the fix in the repo :)

crcollins commented 5 years ago

I guess I will close this because it is technically fixed. I am still running some tests before merging in some more fixes for the form settings though. In general, I think the verdict is still out on the best way to do the reduction.