How to get a complexity metric for a new dataset?

Hi Melanie! Thank you for reaching out!

As described in section 4.2 of our work, we labelled the datasets manually 50 random samples from the dataset to retrieve the statefulness values. We made a small applet to facilitate the labelling process. I have just created a quick demo for you! Screencast from 2024-11-15 15-32-20.webm

The applet is located in this directory: https://github.com/Flecart/complexity-tom-dwm/tree/main/statefulness/app. You should run python3 server.py and connect to localhost at port 8000 to see the interface you see in the video. Then, to create a state you should highlight the sentence or part of it. To remove a state you should click the highlighted text.

I strongly suggest to serialize the data for the applet into the Schema described at this line, the applet might not work if the input json doesn't have that format, I have not tested this scenario.

Having prompt, question and answer is all you need to create the labelled data!

I have looked at the parameter we used in our work. We used $\tau$ as 0.2 for every dataset. So that's the suggested parameter choice.

And bash script/gpt-3.5.sh is the script for the accuracy result for the prompting method we proposed, not for the complexity metric! For the complexity metric we ran the script in: https://github.com/Flecart/complexity-tom-dwm/blob/main/statefulness/copy_state_data.py. This will print out the stateful and stateless values for each sample in the data.

Then, in the report we did something similar to the following:

# paste the output result for stateful value
tomi = np.array([1, 1, 1, 4, 3, 1, 5, 5, 1, 3, 3, 1, 5, 4, 4, 1, 1, 4, 4, 1, 2, 6, 1, 2, 1, 1, 3, 5, 3, 1, 5, 6, 4, 1, 1, 5, 3, 5, 1, 1, 1, 5, 1, 1, 1, 3, 1, 3, 4, 3], dtype=float)

# paste the stateless value
tau = 0.2
tomi += tau*np.array([8, 5, 7, 5, 1, 3, 4, 2, 3, 6, 3, 2, 7, 2, 3, 1, 4, 5, 3, 3, 6, 6, 1, 8, 6, 7, 6, 6, 3, 2, 7, 2, 0, 4, 7, 4, 2, 5, 2, 5, 6, 5, 1, 5, 8, 5, 5, 7, 4, 4])

Then, we used boxplots to plot the results.

If you need further assitance, feel free to reach out!

Flecart / complexity-tom-dwm

How to get a complexity metric for a new dataset? #1