cbg-ethz / BnpC

Bayesian non-parametric clustering (BnpC) of binary data with missing values and uneven error rates
MIT License
18 stars 4 forks source link

IndexError with Default Parameter Settings #31

Closed qmuuu closed 3 months ago

qmuuu commented 4 months ago

Hi,

I encountered some index errors when testing BnpC with the default parameter settings. Here is the command I used to run the program: python run_BnpC.py example_data/data.csv. Here is the error I received in get_mean_hierarchy_assignment params[i] += params_full[step][cl_ids[step]] IndexError: index 67 is out of bounds for axis 0 with size 7 Please let me know if additional information is required to troubleshoot this issue.

Thank you!

NBMueller commented 3 months ago

Hey there,

I ran python run_BnpC.py example_data/data.csv several times and did not run into this error. Could you provide me with the chain_seeds, which should be provided in the <OUT_DIR>/args.txt file, to investigate this further?

Thanks

qmuuu commented 3 months ago

Hi, I ran the command python run_BnpC.py example_data/data.csv -s 1000 --debug for 100 times, and about 40 times it gave the error. Here is one of the debug log

DPMM with: 100 cells 100 mutations learning errors

Priors: params.: Beta(0.25,0.25) CRP a_0: Gamma(10.00,1) FP: trunc norm(0.01,0.01) FN: trunc norm(0.2,0.1)

Move probabilitites: Split/merge: 0.33 split/merge ratio: [0.75, 0.25] intermediate Gibbs: 3 CRP a_0 update: 0.25 Errors update: 0.25

Run MCMC with (1 chains for 1000 steps):

Seed set to: 4006553414

Chain: 01 step: 100 / 1000 mean MH accept. ratio: parameters: 0.18 splits: 0.04 merges: 0.00 FP: 0.12 FN: 0.25 Chain: 01 step: 200 / 1000 mean MH accept. ratio: parameters: 0.16 splits: 0.24 merges: 0.00 FP: 0.04 FN: 0.11 Chain: 01 step: 300 / 1000 mean MH accept. ratio: parameters: 0.16 splits: 0.29 merges: 0.00 FP: 0.00 FN: 0.04 Chain: 01 step: 400 / 1000 mean MH accept. ratio: parameters: 0.16 splits: 0.25 merges: 0.00 FP: 0.05 FN: 0.14 Chain: 01 step: 500 / 1000 mean MH accept. ratio: parameters: 0.16 splits: 0.15 merges: 0.00 FP: 0.00 FN: 0.04 Chain: 01 step: 600 / 1000 mean MH accept. ratio: parameters: 0.17 splits: 0.24 merges: 0.00 FP: 0.05 FN: 0.21 Chain: 01 step: 700 / 1000 mean MH accept. ratio: parameters: 0.16 splits: 0.25 merges: 0.00 FP: 0.00 FN: 0.11 Chain: 01 step: 800 / 1000 mean MH accept. ratio: parameters: 0.16 splits: 0.25 merges: 0.00 FP: 0.00 FN: 0.17 Chain: 01 step: 900 / 1000 mean MH accept. ratio: parameters: 0.17 splits: 0.30 merges: 0.00 FP: 0.04 FN: 0.11 Chain: 01 step: 1000 / 1000 mean MH accept. ratio: parameters: 0.17 splits: 0.48 merges: 0.00 FP: 0.00 FN: 0.03 Traceback (most recent call last): File "/gpfs/research/fangroup/lz20w/software/BnpC/run_BnpC.py", line 295, in main(args) File "/gpfs/research/fangroup/lz20w/software/BnpC/run_BnpC.py", line 290, in main generate_output(args, results, data, data_names) File "/gpfs/research/fangroup/lz20w/software/BnpC/run_BnpC.py", line 205, in generate_output inferred = io._infer_results(args, results, data_raw) File "/gpfs/research/fangroup/lz20w/software/BnpC/libs/dpmmIO.py", line 215, in _infer_results inf_est = ut.get_latents_posterior(results, data, args.single_chains) File "/gpfs/research/fangroup/lz20w/software/BnpC/libs/utils.py", line 200, in get_latents_posterior latents.append(_get_latents_posterior_chain(result, data)) File "/gpfs/research/fangroup/lz20w/software/BnpC/libs/utils.py", line 226, in _get_latents_posterior_chain assign, geno = get_mean_hierarchy_assignment( File "/gpfs/research/fangroup/lz20w/software/BnpC/libs/utils.py", line 178, in get_mean_hierarchy_assignment params[i] += params_full[step][cl_ids[step]] IndexError: index 68 is out of bounds for axis 0 with size 8

Thank you for your help!

NBMueller commented 3 months ago

There was a bug related to the cluster label problem: the genotype parameters were stored with index 0 to max(#clusters), but then the estimator tried to access the actual label id, which could be higher than max(#clusters). I pushed a change to the master branch, let me know if the problem still occurs. cheers