Magnitude of Vectors on chart not taking into account number of vectors?

Zyin055 / Inspect-Embedding-Training

Python script to analyze textual inversion embedding files used with AI image generators

MIT License

102 stars 11 forks source link

Magnitude of Vectors on chart not taking into account number of vectors? #2

Closed zrichz closed 1 year ago

zrichz commented 1 year ago

Hi - as far as I can tell with the code, there needs to be a division at the end of this code snippet to take into account the number of vectors used for the embedding, to get the average magnitude across all vectors:

def get_vector_data_magnitude(data: dict[int, dict[int, Tensor]], step: int) -> float: value = 0 for n in data[step]: value += pow(n, 2) value = math.sqrt(value) ##<------needs a divisor here? return value

as current this is just squaring every data point and then sqrt the result

Zyin055 commented 1 year ago

So something like this?

def get_vector_data_magnitude(data: dict[int, dict[int, Tensor]], step: int) -> float:
    value = 0
    for n in data[step]:
        value += pow(n, 2)
    vectors_per_token = int(len(data[step]) / DIMS_PER_VECTOR) #ie: 1, 3, 10, etc
    value = math.sqrt(value) / vectors_per_token
    return value

zrichz commented 1 year ago

Yes I believe so. I was trying to write this change but didn't realise you had to explicitly define the vectors per token again. I'm happy to test this tonight and let you know

zrichz commented 1 year ago

new20220111H-1050-2300-vector original20220111H-1050-2300-vector

to update, I have tested your code change on a 3-vector TI, and all looks great - the new Average Vector magnitude (top image) reports as 4.5.. and not 3x that (13.7..), and the average vector strength is (rightly) unaffected.