BiomedSciAI / fuse-med-ml

A python framework accelerating ML based discovery in the medical field by encouraging code reuse. Batteries included :)
Apache License 2.0
134 stars 34 forks source link

Add OpToNestedTensor (Prototype, for ref) #220

Closed SagiPolaczek closed 1 year ago

SagiPolaczek commented 1 year ago

(NOT FOR MERGE) Hey All,

Following @YoelShoshan's proposal and #219 I implemented some sort of prototype for supporting Nested Tensors. Note that it requires latest PyTorch version (and it will to do so, since they still working on it). Current version is 1.13.0.

Example of how I used it in a data pipeline (DTI task, so the sequence are differ in size between samples!):

        dynamic_pipeline = [
            # read data
            (OpReadDataframe(data=df, rename_columns=rename_columns, key_column=None), dict()),

            # dummy encoding (ASCII values) and conversion to tensors
            (OpDummyStringToNumbers(), dict(key_in="data.drug.smiles", key_out="data.drug.encoding")),
            (OpDummyStringToNumbers(), dict(key_in="data.target.sequence", key_out="data.target.encoding")),
            (OpToNumpy(), dict(key=["data.drug.encoding", "data.target.encoding"], dtype=np.float32)),
            (OpToTensor(), dict(key=["data.drug.encoding", "data.target.encoding"], dtype=torch.float)),

            # covert to nested tensor
            (OpToNestedTensor(), dict(keys_in=["data.drug.encoding", "data.target.encoding"], key_out="data.input.nested_tensor")),
        ]
SagiPolaczek commented 1 year ago

for future ref: The use of Nested Tensor should be consider to be in the collate phase, and not as a stand-alone op.

thanks @mosheraboh

SagiPolaczek commented 1 year ago

Cleaning PRs