Memory Allocation Error in impute.py

AnnabelPerry commented 1 year ago

Hello, I am attempting to run impute.py in a conda environment with Python version 3.9.16, pandas version 1.1.4. I am encountering the following error:

2023-06-27 19:22:09,875 INFO impute - main: creating pedigree ...
2023-06-27 19:22:09,981 INFO preprocess_data - create_pedigree: loaded kinship file
2023-06-27 19:22:10,063 INFO preprocess_data - create_pedigree: loaded agesex file
2023-06-27 19:22:10,129 INFO preprocess_data - create_pedigree: creating age and sex dictionaries
2023-06-27 19:22:10,192 INFO preprocess_data - create_pedigree: dictionaries created
2023-06-27 19:22:10,192 INFO preprocess_data - create_pedigree: creating pedigree objects
2023-06-27 19:22:10,261 INFO impute - main: pedigree loaded.
2023-06-27 19:22:10,265 INFO impute - run_imputation: processing /n/groups/reich/anp9168/VCFs/chr1,None
2023-06-27 19:22:10,265 INFO preprocess_data - prepare_data: For file /n/groups/reich/anp9168/VCFs/chr1;None: Finding which chromosomes
2023-06-27 19:22:27,153 INFO preprocess_data - prepare_data: with chromosomes [1] initializing non_gts data
2023-06-27 19:22:27,154 INFO preprocess_data - prepare_data: with chromosomes [1] loading and filtering pedigree file ...
2023-06-27 19:22:27,985 INFO preprocess_data - prepare_data: Adding control to the pedigree ...
2023-06-27 19:22:28,008 INFO preprocess_data - prepare_data: Control Added.
2023-06-27 19:22:28,363 INFO preprocess_data - prepare_data: with chromosomes [1] loading bim file ...
2023-06-27 19:22:28,363 INFO preprocess_data - prepare_data: with chromosomes [1] loading and transforming ibd file ...
2023-06-27 19:22:31,564 INFO preprocess_data - prepare_data: ibd loaded.
2023-06-27 19:22:31,564 INFO preprocess_data - prepare_data: with chromosomes ['1'] initializing non_gts data done ...
2023-06-27 19:22:31,733 INFO preprocess_data - prepare_gts: with chromosomes ['1'] initializing gts data with start=0 end=58745
Traceback (most recent call last):
  File "/home/anp9168/anaconda3/envs/sniparEnv/bin/impute.py", line 432, in <module>
    main(args)
  File "/home/anp9168/anaconda3/envs/sniparEnv/bin/impute.py", line 326, in main
    run_imputation(args)
  File "/home/anp9168/anaconda3/envs/sniparEnv/bin/impute.py", line 208, in run_imputation
    phased_gts, unphased_gts, iid_to_bed_index, pos, freqs, hdf5_output_dict = prepare_gts(phased_address, unphased_address, bim, pedigree_output, ped_ids, chromosomes, start, end, pcs, pc_ids, find_optimal_pc)
  File "/home/anp9168/anaconda3/envs/sniparEnv/lib/python3.9/site-packages/snipar/imputation/preprocess_data.py", line 713, in prepare_gts
    probs= bgen.read((slice(0, len(bgen.samples)),slice(start, end)))        
  File "/home/anp9168/anaconda3/envs/sniparEnv/lib/python3.9/site-packages/bgen_reader/_bgen2.py", line 552, in read
    val = np.full(
  File "/home/anp9168/anaconda3/envs/sniparEnv/lib/python3.9/site-packages/numpy/core/numeric.py", line 343, in full
    a = empty(shape, dtype, order)
numpy.core._exceptions._ArrayMemoryError: Unable to allocate 853. GiB for an array with shape (487409, 58745, 4) and data type float64

Here is the code I ran:

source activate sniparEnv
unset PYTHONPATH

impute.py -c --ibd IBD_Chr@.ibd --bgen chr@ --out Imputed_Chr@ --king FirstDegreeKING_forImputation.kin0 --agesex FirstDegreeAgeSex_forImputation.txt

Initially, I gave the --ibd flag the IBD_Chr@ prefix without the .ibd suffix, but got the following error:

FileNotFoundError: [Errno 2] No such file or directory: 'IBD_Chr1.segments.gz'

I checked my ibd.py outputs and they all are named in the format IBD_Chr@.ibd.segments.gz and IBD_Chr@.l2.ldscore.gz, so I added the .ibd suffix to help snipar find the IBD_Chr@.ibd.segments.gz files, but I worry this introduced a new error

AlexTISYoung commented 1 year ago

There's an argument to change the number of batches you use to run the imputation. If you increase the number of batches, it will decrease the memory usage.

On Wed, Jun 28, 2023, 4:09 AM Annabel Perry @.***> wrote:

Hello, I am attempting to run impute.py in a conda environment with Python version 3.9.16, pandas version 1.1.4. I am encountering the following error:

2023-06-27 19:22:09,875 INFO impute - main: creating pedigree ... 2023-06-27 19:22:09,981 INFO preprocess_data - create_pedigree: loaded kinship file 2023-06-27 19:22:10,063 INFO preprocess_data - create_pedigree: loaded agesex file 2023-06-27 19:22:10,129 INFO preprocess_data - create_pedigree: creating age and sex dictionaries 2023-06-27 19:22:10,192 INFO preprocess_data - create_pedigree: dictionaries created 2023-06-27 19:22:10,192 INFO preprocess_data - create_pedigree: creating pedigree objects 2023-06-27 19:22:10,261 INFO impute - main: pedigree loaded. 2023-06-27 19:22:10,265 INFO impute - run_imputation: processing /n/groups/reich/anp9168/VCFs/chr1,None 2023-06-27 19:22:10,265 INFO preprocess_data - prepare_data: For file /n/groups/reich/anp9168/VCFs/chr1;None: Finding which chromosomes 2023-06-27 19:22:27,153 INFO preprocess_data - prepare_data: with chromosomes [1] initializing non_gts data 2023-06-27 19:22:27,154 INFO preprocess_data - prepare_data: with chromosomes [1] loading and filtering pedigree file ... 2023-06-27 19:22:27,985 INFO preprocess_data - prepare_data: Adding control to the pedigree ... 2023-06-27 19:22:28,008 INFO preprocess_data - prepare_data: Control Added. 2023-06-27 19:22:28,363 INFO preprocess_data - prepare_data: with chromosomes [1] loading bim file ... 2023-06-27 19:22:28,363 INFO preprocess_data - prepare_data: with chromosomes [1] loading and transforming ibd file ... 2023-06-27 19:22:31,564 INFO preprocess_data - prepare_data: ibd loaded. 2023-06-27 19:22:31,564 INFO preprocess_data - prepare_data: with chromosomes ['1'] initializing non_gts data done ... 2023-06-27 19:22:31,733 INFO preprocess_data - prepare_gts: with chromosomes ['1'] initializing gts data with start=0 end=58745 Traceback (most recent call last): File "/home/anp9168/anaconda3/envs/sniparEnv/bin/impute.py", line 432, in main(args) File "/home/anp9168/anaconda3/envs/sniparEnv/bin/impute.py", line 326, in main run_imputation(args) File "/home/anp9168/anaconda3/envs/sniparEnv/bin/impute.py", line 208, in run_imputation phased_gts, unphased_gts, iid_to_bed_index, pos, freqs, hdf5_output_dict = prepare_gts(phased_address, unphased_address, bim, pedigree_output, ped_ids, chromosomes, start, end, pcs, pc_ids, find_optimal_pc) File "/home/anp9168/anaconda3/envs/sniparEnv/lib/python3.9/site-packages/snipar/imputation/preprocess_data.py", line 713, in prepare_gts probs= bgen.read((slice(0, len(bgen.samples)),slice(start, end))) File "/home/anp9168/anaconda3/envs/sniparEnv/lib/python3.9/site-packages/bgen_reader/_bgen2.py", line 552, in read val = np.full( File "/home/anp9168/anaconda3/envs/sniparEnv/lib/python3.9/site-packages/numpy/core/numeric.py", line 343, in full a = empty(shape, dtype, order) numpy.core._exceptions._ArrayMemoryError: Unable to allocate 853. GiB for an array with shape (487409, 58745, 4) and data type float64

Here is the code I ran:

source activate sniparEnv unset PYTHONPATH

impute.py -c --ibd @.*** --bgen chr@ --out Imputed_Chr@ --king FirstDegreeKING_forImputation.kin0 --agesex FirstDegreeAgeSex_forImputation.txt

Initially, I gave the --ibd flag the IBD_Chr@ prefix without the .ibd suffix, but got the following error:

FileNotFoundError: [Errno 2] No such file or directory: 'IBD_Chr1.segments.gz'

I checked my ibd.py outputs and they all are named in the format IBD_Chr@ .ibd.segments.gz and @., so I added the .ibd suffix to help snipar find the @. files, but I worry this introduced a new error

— Reply to this email directly, view it on GitHub https://github.com/AlexTISYoung/snipar/issues/29, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABQQS6INXMDYYVY7CSP6X73XNQGHVANCNFSM6AAAAAAZW663AY . You are receiving this because you are subscribed to this thread.Message ID: @.***>

AnnabelPerry commented 1 year ago

Thank you for your quick response - I've tried running with the --batch_size argument (and also with a single hyphen as in -batch_size) set to 5000, but in both cases I get impute.py: error: unrecognized arguments: -batch_size 5000

AlexTISYoung commented 1 year ago

Apologies the argument for the impute.py script is --chunks. Try using --chunks 10 and increase if it still causes issues.

On Wed, Jun 28, 2023, 10:35 AM Annabel Perry @.***> wrote:

Thank you for your quick response - I've tried running with the --batch_size argument (and also with a single hyphen as in -batch_size) set to 5000, but in both cases I get impute.py: error: unrecognized arguments: -batch_size 5000

— Reply to this email directly, view it on GitHub https://github.com/AlexTISYoung/snipar/issues/29#issuecomment-1611824286, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABQQS6NXD6GVFXDYU4XMUTTXNRTPBANCNFSM6AAAAAAZW663AY . You are receiving this because you commented.Message ID: @.***>

AnnabelPerry commented 1 year ago

That worked, thanks!

AlexTISYoung / snipar

Memory Allocation Error in impute.py #29