ericproffitt / TopicModelsVB.jl

A Julia package for variational Bayesian topic modeling.
Other
81 stars 8 forks source link

Is it possible to use Int32 for docs.terms #33

Closed ValeriiBaidin closed 4 years ago

ValeriiBaidin commented 4 years ago

Is it possible to use Int32 for docs.terms It will help to save memory.

p.s. I think even int16 is enough. =))

I am sorry to bother you again. Thank you so much

ValeriiBaidin commented 4 years ago

Is it possible to use Int32 for docs.terms It will help to save memory.

p.s. I think even int16 is enough. =))

I am sorry to bother you again. Thank you so much

I did it by myself. I hope it helps to save memory.

ValeriiBaidin commented 4 years ago

Would you help, how to change doc.count to Int16 I will help to save 75% space.

Where I have to change code. I have a problem with buffer. in keyword argument hostbuf, expected Union{Nothing, Array{Int64,N} where N}, got Array{Int16,1}

Thank you so much

ericproffitt commented 4 years ago

Hi Valerii,

So for the gpuLDA model, if you're changing both doc.terms and doc.counts to Int16, then you need to change lines 388 and 390 in modelutils.jl to,

model.terms_buffer = cl.Buffer(Int16, model.context, (:r, :copy), hostbuf=terms)
model.counts_buffer = cl.Buffer(Int16, model.context, (:r, :copy), hostbuf=counts)

You also then need to change lines 31 and 33 in gpuLDA.jl to,

terms_buffer::cl.Buffer{Int16}
counts_buffer::cl.Buffer{Int16}

And then also lines 282 and 312 in gpuLDA.jl to,

const global short *counts,
const global short *terms,

I think this is everything, but I haven't tested it out myself, so I can't guarantee that it will work.

ericproffitt commented 4 years ago

On a more general note,

I may look into defining Int32 and Int16 constructors for the Document and Corpus types. However I'll need to think some about how exactly this will work and any potential side effects before I actually make the changes.

ValeriiBaidin commented 4 years ago

I've done it for my uses right now (GPU)

Terms - Int32 counts - Int16

It saves memory and I hope also increase speed.