Closed jkeskingvillage closed 2 months ago
@TathagataChakraborti Can an instance of this tags object be return from a method in a Flow instance? Profiler can pass the tags instances to the downstream. The fields in the object seem to be handled best at Flow
. I can handle the downstream once Flow
has a method to return an instance of this Tags object.
Being addressed in iss104
branch ETA Wednesday
This issue will be handles at the Runner.
This can be handled best at PddlGeneratorOutput
. @TathagataChakraborti I will add a method to produce tags at PddlGeneratorOutput
to pass the tags to the downstream.
Done.
A set of tags to describe what is in each data sample. Currently, this is what I am using:
For example, if a sample has more than one goal, then
multiple_goals
should be True. The names should be self explanatory but ask me if you are confused. Here is the schema: link. Note that a request to generate data also has the same tags but also include how many to generate, while the tags for the data also include the final length of the plan. Thelength_of_sequence
of the sequence is a tag that is not like the others because it is not used in the generator but its value is determined and recorded after the sample is generated.Currently, we only need to cover the following tags, the rest keep the default.
In terms of code, the only change required here is at the time of generating a sample, or after it is done, add this set of tags as a field "tags" in the data, inside the
agent_info_generator_input
field.In the training samples, it seems this appears twice. Once inside the
agent_info_generator_input
and again insideagent_info_generator_input
insideagent_info_generator_output_item
. I don't need both but any one or both is fine.Similarly, in the testing samples (in the final report), it appears twice: inside
pddl_generator_output
insidellm_response_planning_data
. Again both, or any one, will do.I have added some examples of the train (produced from the profiler) and test files (produced in the final report after validation in your pipeline) here. train_samples.json is a sample training file; and testing_llama.json / testing_granite.json are sample test files.