andriy-nikolov commented 2 years ago

Summary

An implementation of the CASTER model layers based on https://github.com/kexinhuang12345/CASTER

Note:

covers only supervised training stage
input dimensionality assumed to be correct and meaningful according to the assumptions of the algorithm
TODO: inclusion of the unsupervised stage (would require a pipeline hook)
TODO: inclusion of the BIOSNAP dataset from the paper
TODO: data loading using the input processing from the paper
[x] Unit tests provided for these changes
[x] Documentation and docstrings added for these changes using the sphinx style

Changes

Implementation of the CASTER model with the custom loss function
An example script for invoking the model (performance not meaningful because of the incomplete implementation of the model)
A unit test method to check the output dimensionality

cthoyt commented 2 years ago

@andriy-nikolov note I did some reorganization of the code in this PR. Make sure you pull before you begin to work on it again

codecov-commenter commented 2 years ago

Codecov Report

Merging #73 (a9aaae8) into main (5449f96) will increase coverage by 0.83%. The diff coverage is 97.93%.

@@            Coverage Diff             @@
##             main      #73      +/-   ##
==========================================
+ Coverage   93.87%   94.70%   +0.83%     
==========================================
  Files          29       30       +1     
  Lines         832     1058     +226     
==========================================
+ Hits          781     1002     +221     
- Misses         51       56       +5

Impacted Files	Coverage Δ
chemicalx/pipeline.py	`87.67% <66.66%> (-0.91%)`	:arrow_down:
chemicalx/models/deepddi.py	`95.00% <94.73%> (-5.00%)`	:arrow_down:
chemicalx/models/deepdrug.py	`96.77% <96.66%> (-3.23%)`	:arrow_down:
chemicalx/models/gcnbmp.py	`97.61% <97.59%> (-2.39%)`	:arrow_down:
chemicalx/loss.py	`100.00% <100.00%> (ø)`
chemicalx/models/caster.py	`100.00% <100.00%> (ø)`
tests/unit/test_models.py	`100.00% <100.00%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 5449f96...a9aaae8. Read the comment docs.

cthoyt commented 2 years ago

I have several concerns with this PR, would have been nice to do a review first. Most importantly: why does it change the standard interface of the forward() function? I don't see where any of the other things it returns are used

benedekrozemberczki commented 2 years ago

The paper discusses two types of training techniques - supervised and unsupervised. In the unsupervised setting you could use any type of drug pair dataset. This forward pass allows for both setups, in our experiments we only consider supervised ones.

AstraZeneca / chemicalx

CASTER layer implementation #73

Summary

Changes

Codecov Report