Closed AnnabelPerry closed 1 year ago
There's an argument to change the number of batches you use to run the imputation. If you increase the number of batches, it will decrease the memory usage.
On Wed, Jun 28, 2023, 4:09 AM Annabel Perry @.***> wrote:
Hello, I am attempting to run impute.py in a conda environment with Python version 3.9.16, pandas version 1.1.4. I am encountering the following error:
2023-06-27 19:22:09,875 INFO impute - main: creating pedigree ... 2023-06-27 19:22:09,981 INFO preprocess_data - create_pedigree: loaded kinship file 2023-06-27 19:22:10,063 INFO preprocess_data - create_pedigree: loaded agesex file 2023-06-27 19:22:10,129 INFO preprocess_data - create_pedigree: creating age and sex dictionaries 2023-06-27 19:22:10,192 INFO preprocess_data - create_pedigree: dictionaries created 2023-06-27 19:22:10,192 INFO preprocess_data - create_pedigree: creating pedigree objects 2023-06-27 19:22:10,261 INFO impute - main: pedigree loaded. 2023-06-27 19:22:10,265 INFO impute - run_imputation: processing /n/groups/reich/anp9168/VCFs/chr1,None 2023-06-27 19:22:10,265 INFO preprocess_data - prepare_data: For file /n/groups/reich/anp9168/VCFs/chr1;None: Finding which chromosomes 2023-06-27 19:22:27,153 INFO preprocess_data - prepare_data: with chromosomes [1] initializing non_gts data 2023-06-27 19:22:27,154 INFO preprocess_data - prepare_data: with chromosomes [1] loading and filtering pedigree file ... 2023-06-27 19:22:27,985 INFO preprocess_data - prepare_data: Adding control to the pedigree ... 2023-06-27 19:22:28,008 INFO preprocess_data - prepare_data: Control Added. 2023-06-27 19:22:28,363 INFO preprocess_data - prepare_data: with chromosomes [1] loading bim file ... 2023-06-27 19:22:28,363 INFO preprocess_data - prepare_data: with chromosomes [1] loading and transforming ibd file ... 2023-06-27 19:22:31,564 INFO preprocess_data - prepare_data: ibd loaded. 2023-06-27 19:22:31,564 INFO preprocess_data - prepare_data: with chromosomes ['1'] initializing non_gts data done ... 2023-06-27 19:22:31,733 INFO preprocess_data - prepare_gts: with chromosomes ['1'] initializing gts data with start=0 end=58745 Traceback (most recent call last): File "/home/anp9168/anaconda3/envs/sniparEnv/bin/impute.py", line 432, in
main(args) File "/home/anp9168/anaconda3/envs/sniparEnv/bin/impute.py", line 326, in main run_imputation(args) File "/home/anp9168/anaconda3/envs/sniparEnv/bin/impute.py", line 208, in run_imputation phased_gts, unphased_gts, iid_to_bed_index, pos, freqs, hdf5_output_dict = prepare_gts(phased_address, unphased_address, bim, pedigree_output, ped_ids, chromosomes, start, end, pcs, pc_ids, find_optimal_pc) File "/home/anp9168/anaconda3/envs/sniparEnv/lib/python3.9/site-packages/snipar/imputation/preprocess_data.py", line 713, in prepare_gts probs= bgen.read((slice(0, len(bgen.samples)),slice(start, end))) File "/home/anp9168/anaconda3/envs/sniparEnv/lib/python3.9/site-packages/bgen_reader/_bgen2.py", line 552, in read val = np.full( File "/home/anp9168/anaconda3/envs/sniparEnv/lib/python3.9/site-packages/numpy/core/numeric.py", line 343, in full a = empty(shape, dtype, order) numpy.core._exceptions._ArrayMemoryError: Unable to allocate 853. GiB for an array with shape (487409, 58745, 4) and data type float64 Here is the code I ran:
source activate sniparEnv unset PYTHONPATH
impute.py -c --ibd @.*** --bgen chr@ --out Imputed_Chr@ --king FirstDegreeKING_forImputation.kin0 --agesex FirstDegreeAgeSex_forImputation.txt
Initially, I gave the --ibd flag the IBD_Chr@ prefix without the .ibd suffix, but got the following error:
FileNotFoundError: [Errno 2] No such file or directory: 'IBD_Chr1.segments.gz'
I checked my ibd.py outputs and they all are named in the format IBD_Chr@ .ibd.segments.gz and @., so I added the .ibd suffix to help snipar find the @. files, but I worry this introduced a new error
— Reply to this email directly, view it on GitHub https://github.com/AlexTISYoung/snipar/issues/29, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABQQS6INXMDYYVY7CSP6X73XNQGHVANCNFSM6AAAAAAZW663AY . You are receiving this because you are subscribed to this thread.Message ID: @.***>
Thank you for your quick response - I've tried running with the --batch_size
argument (and also with a single hyphen as in -batch_size
) set to 5000, but in both cases I get impute.py: error: unrecognized arguments: -batch_size 5000
Apologies the argument for the impute.py script is --chunks. Try using --chunks 10 and increase if it still causes issues.
On Wed, Jun 28, 2023, 10:35 AM Annabel Perry @.***> wrote:
Thank you for your quick response - I've tried running with the --batch_size argument (and also with a single hyphen as in -batch_size) set to 5000, but in both cases I get impute.py: error: unrecognized arguments: -batch_size 5000
— Reply to this email directly, view it on GitHub https://github.com/AlexTISYoung/snipar/issues/29#issuecomment-1611824286, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABQQS6NXD6GVFXDYU4XMUTTXNRTPBANCNFSM6AAAAAAZW663AY . You are receiving this because you commented.Message ID: @.***>
That worked, thanks!
Hello, I am attempting to run impute.py in a conda environment with Python version 3.9.16, pandas version 1.1.4. I am encountering the following error:
Here is the code I ran:
Initially, I gave the
--ibd
flag theIBD_Chr@
prefix without the.ibd
suffix, but got the following error:I checked my
ibd.py
outputs and they all are named in the formatIBD_Chr@.ibd.segments.gz
andIBD_Chr@.l2.ldscore.gz
, so I added the.ibd
suffix to helpsnipar
find theIBD_Chr@.ibd.segments.gz
files, but I worry this introduced a new error