JuliaString / InternedStrings.jl

Fully transparent string interning functionality, with proper garbage collection
Other
27 stars 6 forks source link

A fast way to sample from InternedStrings.jl #8

Closed xiaodaigh closed 6 years ago

xiaodaigh commented 6 years ago

Sampling random strings from a samplespace of Strings is quite fast

srand(1);
@time samplespace1 = "id".*dec.(1:N÷K, 10)
@time svec1 = rand(samplespace1,N);

but sampling random interned strigns from a sample space of InternedStrings is slow.

srand(1);
@time samplespace = InternedString.("id".*dec.(1:N÷K, 10));
function make_svec(N, samplespace)
    rand(samplespace, N)
end

@time make_svec(N, samplespace);

In general, making a long Vector{InternedStrings} is quite slow. Not sure what the fix is.

oxinabox commented 6 years ago

Ideally, the julia optimizer would detect the lock and unlock, and hoist them outside the broadcast loop. otherwise, maybe we could overload broadcast (you can't really do that in 0.6, but you can in 0.5 and you might be be able to in 0.7? ) and do that manually.

Other than that, it just generally making it faster, which at this point means making a faster way of hashing strings. Which is totally possible, but I don't care to do it right now. (I would accept a PR. or speak in favor of a PR to Base)

oxinabox commented 6 years ago

It is in general faster now