It turns out this is more difficult than I thought. In the Supermemo-2 algorithm, the E-factor roughly corresponds to difficulty. However, if the user enters a quality less than 3, the E-factor doesn't change at all! Instead, the item is given the most frequent review schedule. So it's not a direct measure of difficulty, but instead is just used to decide how to schedule the item. Consequently, I wasn't able to produce a reliable difficulty metric to display.
It turns out this is more difficult than I thought. In the Supermemo-2 algorithm, the E-factor roughly corresponds to difficulty. However, if the user enters a quality less than 3, the E-factor doesn't change at all! Instead, the item is given the most frequent review schedule. So it's not a direct measure of difficulty, but instead is just used to decide how to schedule the item. Consequently, I wasn't able to produce a reliable difficulty metric to display.