IBM / nl2flow

NL2Flow: A PDDL Interface to Flow Construction
Apache License 2.0
8 stars 2 forks source link

Add Tags to Samples from Generator #104

Closed jkeskingvillage closed 2 weeks ago

jkeskingvillage commented 1 month ago

A set of tags to describe what is in each data sample. Currently, this is what I am using:

avoid_hash_conflicts = {bool} False
constraints_in_goal = {bool} False
constraints_in_input = {bool} False
constraints_in_memory = {bool} False
constraints_in_output = {bool} False
constraints_in_spec = {bool} False
coupling_of_agents = {float} 75.0
enable_mapping_cost = {bool} False
enable_maps = {bool} True
enable_slots = {bool} True
enable_slotting_cost = {bool} False
enable_typing = {bool} False
failure_in_history = {bool} False
flat_type_hierarchy = {bool} False
history_in_memory = {bool} False
multiple_goals = {bool} True
number_of_agents = {int} 50
number_of_samples = {int} 1000
number_of_types = {int} 10
number_of_variables = {int} 50
objects_as_goal = {bool} False
objects_in_memory = {bool} False
operators_as_goal = {bool} True
or_goals = {bool} False
parameterized = {bool} False
parameters_per_agent = {int} 5
tristate_variables = {bool} False
upload = {NoneType} None

For example, if a sample has more than one goal, then multiple_goals should be True. The names should be self explanatory but ask me if you are confused. Here is the schema: link. Note that a request to generate data also has the same tags but also include how many to generate, while the tags for the data also include the final length of the plan. The length_of_sequence of the sequence is a tag that is not like the others because it is not used in the generator but its value is determined and recorded after the sample is generated.

Currently, we only need to cover the following tags, the rest keep the default.

number_of_agents
number_of_variables
parameters_per_agent
coupling_of_agents
enable_slots
enable_maps
multiple_goals
objects_in_memory
length_of_sequence

In terms of code, the only change required here is at the time of generating a sample, or after it is done, add this set of tags as a field "tags" in the data, inside the agent_info_generator_input field.

I have added some examples of the train (produced from the profiler) and test files (produced in the final report after validation in your pipeline) here. train_samples.json is a sample training file; and testing_llama.json / testing_granite.json are sample test files.

jkeskingvillage commented 1 month ago

@TathagataChakraborti Can an instance of this tags object be return from a method in a Flow instance? Profiler can pass the tags instances to the downstream. The fields in the object seem to be handled best at Flow. I can handle the downstream once Flow has a method to return an instance of this Tags object.

TathagataChakraborti commented 1 month ago

Being addressed in iss104 branch ETA Wednesday

jkeskingvillage commented 1 month ago

This issue will be handles at the Runner.

jkeskingvillage commented 1 month ago

This can be handled best at PddlGeneratorOutput. @TathagataChakraborti I will add a method to produce tags at PddlGeneratorOutput to pass the tags to the downstream.

jkeskingvillage commented 2 weeks ago

Done.