epfl-dlab / aiflows

🤖🌊 aiFlows: The building blocks of your collaborative AI
https://epfl-dlab.github.io/aiflows/
MIT License
234 stars 11 forks source link

Feature/add keyunpack to datatransformations #9

Closed Tachi-67 closed 8 months ago

Tachi-67 commented 8 months ago

Add a new data transformation: KeyUnpack, this transformation is able to unpack nested dictionaries given keys (can be passed in the nested fashion e.g. key_name_A.key_name_B).

For example: keys_to_unpack: ["observation"]

    data_dict = {
        "observation":
            {
                "code": "some code",
                "file_loc": "some path",
                "human_feedback": "some feedback"
            }
    }
    output = {
    "code": "some code",
    "file_loc": "some path",
    "human_feedback":"some feedback"
    }

This transformation is useful with nested flows (e.g. branchingflow), when we want to directly have the output from the subflows (one of the branches) and get rid of the surface output layer of the branching flow.

This transformation is more convenient than keys_rename because there is no need to specify the names of the subflows outputs (and also, their respective renames), using this transformation and directly providing the surface layer key will suffice.

nbaldwin98 commented 8 months ago

Hey @Tachi-67, thanks a bunch for the PR! 🚀 I took a look at the changes, and it seems like the existing transformations might cover what you added. Check this out:

from aiflows.data_transformations import KeyRename
from aiflows.data_transformations import KeyDelete

data_dict = {
    "observation":
        {
            "code": "some code",
            "file_loc": "some path",
            "human_feedback": "some feedback"
        }
}

a = KeyRename({"observation.code": "code", "observation.file_loc": "file_loc", "observation.human_feedback": "human_feedback"})
b= KeyDelete(["observation"])

b(a(data_dict))

The output matches your example:

{'code': 'some code',
 'file_loc': 'some path',
 'human_feedback': 'some feedback'}

Just curious, is there a particular reason you're leaning towards the new transformation you added? I'd love to chat about it! 😊 If there isn't a significant difference, I'm thinking it might not be necessary to include. Let me know your thoughts!

Tachi-67 commented 8 months ago

Hey @Tachi-67, thanks a bunch for the PR! 🚀 I took a look at the changes, and it seems like the existing transformations might cover what you added. Check this out:

from aiflows.data_transformations import KeyRename
from aiflows.data_transformations import KeyDelete

data_dict = {
    "observation":
        {
            "code": "some code",
            "file_loc": "some path",
            "human_feedback": "some feedback"
        }
}

a = KeyRename({"observation.code": "code", "observation.file_loc": "file_loc", "observation.human_feedback": "human_feedback"})
b= KeyDelete(["observation"])

b(a(data_dict))

The output matches your example:

{'code': 'some code',
 'file_loc': 'some path',
 'human_feedback': 'some feedback'}

Just curious, is there a particular reason you're leaning towards the new transformation you added? I'd love to chat about it! 😊 If there isn't a significant difference, I'm thinking it might not be necessary to include. Let me know your thoughts!

Hey, as I mentioned in my initial comments, It't true that this behaviour can be replaced by KeyRename and KeyDelete, but I think in this way it's much easier to cooperate with nested flows.

With KeyRename and KeyDelete, the user has to specify every output name of each branch (as you have written, "observation.code", "observation.file_loc", "observation.human_feedback") when trying to just get rid of the top surface layer of data, this is not practical when there are a lot of branches.

I would also like to argue that this seenario is very common because branching flow is one of our principal flows, so I think this transformation will make life much easier.

Tachi-67 commented 8 months ago

This pr is no longer necessary as was stated, the transformation can be replaced by a chain of keyrename and keydelete