computationalprivacy / CorrectMatch.jl

Source code for https://nature.com/articles/s41467-019-10933-3
GNU General Public License v3.0
17 stars 7 forks source link

Could Not Load Library: The specified module could not be found. #6

Open abesolberg opened 2 years ago

abesolberg commented 2 years ago

I am unable to run the individual_uniqueness function, and am getting the error Could not load library "C:\Users\myuser.julia\packages\CorrectMatch\3WzMC\src..\deps\builds\mvndst" the specified module could not be found.

Yet isfile("C:\Users\myuser\.julia\packages\CorrectMatch\3WzMC\src\..\deps\builds\mvndst") is true, so the module should be there. I am using Julia Version 1.7.3 and gfortran (GCC) version 11.3.0.

Thanks in advance

cynddl commented 2 years ago

Thank you for reporting this issue. I'm happy to announce that it should soon work.

I have a working update in https://github.com/computationalprivacy/CorrectMatch.jl/tree/upgrading-deps that now passes all the tests on Julia 1.7 and macOS (https://github.com/computationalprivacy/CorrectMatch.jl/runs/7675628441)

cynddl commented 2 years ago

Hi @abesolberg, would you be able to test with the new release? This should be fixed now.

abesolberg commented 2 years ago

Thanks for looking into this, @cynddl. I am no longer getting the error message, but the individual_uniqueness function is only outputting 0.0. I'm on Windows 10.

cynddl commented 2 years ago

Could this be the normal output to expect for your datasets? The tests pass as usual for individual_uniqueness, so I'm hoping the code works well.

abesolberg commented 2 years ago

@cynddl I don't believe so. It's outputting zero for the readme example, and for a number of other datasets I've tested it with.

cynddl commented 2 years ago

We're getting there, I just pushed a new version on the master branch. :) Seems like it now works on some architectures and not others: https://github.com/computationalprivacy/CorrectMatch.jl/actions/runs/2805373612

abesolberg commented 2 years ago

That worked! Thank you so much for sorting this out. Really appreciate it. Thanks again.

abesolberg commented 2 years ago

Hi @cynddl, sorry to reopen this, but it looks like the new functions are still a little funky. I have been exploring further using the example script provided in the examples/demonstration-notebook.ipynb folder. Version 1.1 of CorrectMatch appears to be overestimating the individual_uniqueness of the non-unique individual.

indiv = data[12,:] 6-element Vector{Int64}: 30 1 7 0 2 1 shifted_indiv = indiv - minimum(data , dims = 1)[:] .+1 6-element Vector{Int64}: 14 , 2 , 8 , 1 , 3 , 2 individual_uniqueness(G , shifted_indiv , N) 0.9999982449923798

I'll add, I'm not 100% sure if I'm doing the shift correctly. Julia syntax appears to have changed a bit since the example was written, so I had to add dims = 1 and .+ to the shift, and this might be the cause, but I don't think it is. I've been exploring with some other datasets, and it does seem like the individual uniqueness is being significantly overestimated for relatively non-distinct observations.

cynddl commented 2 years ago

Thanks for the report @abesolberg! I'll try to have a look soon.

NikolaiKorti commented 2 months ago

Hi @cynddl, sorry to reopen this, but it looks like the new functions are still a little funky. I have been exploring further using the example script provided in the examples/demonstration-notebook.ipynb folder. Version 1.1 of CorrectMatch appears to be overestimating the individual_uniqueness of the non-unique individual.

indiv = data[12,:] 6-element Vector{Int64}: 30 1 7 0 2 1 shifted_indiv = indiv - minimum(data , dims = 1)[:] .+1 6-element Vector{Int64}: 14 , 2 , 8 , 1 , 3 , 2 individual_uniqueness(G , shifted_indiv , N) 0.9999982449923798

I'll add, I'm not 100% sure if I'm doing the shift correctly. Julia syntax appears to have changed a bit since the example was written, so I had to add dims = 1 and .+ to the shift, and this might be the cause, but I don't think it is. I've been exploring with some other datasets, and it does seem like the individual uniqueness is being significantly overestimated for relatively non-distinct observations.

Hello @cynddl and @abesolberg . Trying to get this to run myself. I'm encountering the same behavior mentioned above. Overestimating the individual_uniqueness of the unique individuals. Did any of you find a solution to the problem by chance?

Thanks in advance :)

cynddl commented 2 months ago

Hi @NikolaiKorti, thanks for your message. Could you please share a small reproducible code example for me to check and understand what you were expected from individual_uniqueness(·)?

NikolaiKorti commented 2 months ago

Hey. Thanks for the quick reply @cynddl . I've been trying multiple things. First of all re creating the notebook. Specifically the section Unlikely unique individual Just like @abesolberg I had to change the line with the shift, add dims = 1 and .+ in the line with the shift. And I get a number in the 0.99 region where your example was 0.0002859441553556916 Also, running this small example:

using CorrectMatch

A = [1 1 1; 1 1 1; 1 1 1; 1 1 1; 1 1 1; 1 1 1; 2 3 4]
println(uniqueness(A)) #0.14285714285714285

G = fit_mle(GaussianCopula, A)
println(individual_uniqueness(G, [1, 1, 1], 7)) #0.9948205521670765
println(individual_uniqueness(G, [5, 6, 7], 7)) #1.0
println(individual_uniqueness(G, [2, 3, 4], 7)) #0.999999879258077

The individual_uniqueness for [1,1,1] it outputs is 0.99, which seems incorrect. Should be a very ununique individual.

Using julia version 1.6.7

NikolaiKorti commented 2 months ago

Another little note: Test are running fine. When data only contains [1 1 1] entrys, the value for individual_uniqueness is correctly 0.0:

using CorrectMatch

A = [1 1 1; 1 1 1; 1 1 1; 1 1 1; 1 1 1; 1 1 1]
println(uniqueness(A)) #0.0
G = fit_mle(GaussianCopula, A)
println(individual_uniqueness(G, [1, 1, 1], 6)) #0.0

I tried the same using julia version 1.7.0 resulting in the same behavior.

cynddl commented 2 months ago

You will get the result you expect using:

G = fit_mle(GaussianCopula, A; exact_marginal=true)

Without setting _exactmarginal, the code tries to fit whatever distribution works best for each marginal, which is due to fail on such a small dataset.

NikolaiKorti commented 2 months ago

Thanks for the quick replay again @cynddl . For the example this worked indeed. However I am still having problem re-creating the example in the demonstration-notebook.

I needed to make some adjustments to the code to be able to run it at all. This is what I execute translated to a file:

using CorrectMatch
using StatsBase
using CSV
using DataFrames
using Distributions

df = CSV.read(open("adults.csv"), DataFrame)
df_sub = df[:,[:age, :sex, :workclass, :relationship, Symbol("marital-status"), :race]];
data = Array{Int}(df_sub)
N, M = size(data)

function extract_marginal_ordered(row::AbstractVector)
  cm = collect(values(countmap(row; alg=:dict)))
  Categorical(cm / sum(cm))
end

marginals = [extract_marginal_ordered(data[:, i]) for i=1:M];

G = fit_mle(GaussianCopula, marginals, data);
indiv = data[1, :] # 39 years old male with non Asian/Black/White race
print("Likely unique individual: ")
println(indiv)

shifted_indiv = indiv - minimum(data, dims=1)[:] .+ 1
print("Likely unique individual uniqueness-score: ")
println(individual_uniqueness(G, shifted_indiv, N))

indiv = data[12, :] # 39 years old male with non Asian/Black/White race
print("Unlikely unique individual: ")
println(indiv)

shifted_indiv = indiv - minimum(data, dims=1)[:] .+ 1
print("Unlikely unique individual uniqueness-score: ")
println(individual_uniqueness(G, shifted_indiv, N))

And this is the output I get:

Likely unique individual: [39, 1, 7, 1, 4, 4]
Likely unique individual uniqueness-score: 0.9999922379260358
Unlikely unique individual: [30, 1, 7, 0, 2, 1]
Unlikely unique individual uniqueness-score: 0.9991543217638105

In your notebook the unlikely individual at the bottom receives a uniqueness score of 0.0002859441553556916

I tried setting exact_marginal=false as well and tried a fit on data instead of marginals and using the not shifted indiv. Always comming up with a high score for the unlikely unique individual.

NikolaiKorti commented 2 months ago

Just leaving it here if anyone else runs into the same problem: Downgrading to Julia 1.3.1 solved above mentioned problem for me. I was able to run the example notebook and calculate expected individual uniqueness scores correctly.