FinucaneLab / pops

GNU General Public License v3.0
61 stars 12 forks source link

ValueError: could not broadcast input array from shape (28,) into shape (0,) #16

Open rbutleriii opened 4 months ago

rbutleriii commented 4 months ago

I was able to download the paper feature training set (per #7), munged successfully and went ahead and filtered my Magma output (which is in ENSG format as with the example) to only the genes present in my output and gene_annot_jun10.txt, to make sure they are the same. When I run step 2, I get the following error:

INFO: Verbose output enabled.
INFO: Config dict = {'gene_annot_path': 'gene_annot.txt', 'feature_mat_prefix': 'features_munged/pops', 'num_feature_chunks': 12, 'magma_prefix': 'features_magma/striatum_4x_AdultBrain.pops', 'use_magma_covariates': True, 'use_magma_error_cov': True, 'y_path': None, 'y_covariates_path': None, 'y_error_cov_path': None, 'project_out_covariates_chromosomes': None, 'project_out_covariates_remove_hla': True, 'subset_features_path': None, 'control_features_path': 'control.features', 'feature_selection_chromosomes': None, 'feature_selection_p_cutoff': 0.05, 'feature_selection_max_num': None, 'feature_selection_fss_num_features': None, 'feature_selection_remove_hla': True, 'training_chromosomes': None, 'training_remove_hla': True, 'method': 'ridge', 'out_prefix': '/labs/flongo/PREDICT_HD/PoPS/striatum_4x', 'save_matrix_files': False, 'random_seed': 42, 'verbose': True}
INFO: --project_out_covariates_chromosomes is None, defaulting to all chromosomes
INFO: --feature_selection_chromosomes is None, defaulting to all chromosomes
INFO: --training_chromosomes is None, defaulting to all chromosomes
INFO: MAGMA scores provided, loading MAGMA.
Traceback (most recent call last):
  File "/oak/stanford/scg/lab_flongo/PREDICT_HD/PoPS/pops/pops.py", line 912, in <module>
    main(config_dict)
  File "/oak/stanford/scg/lab_flongo/PREDICT_HD/PoPS/pops/pops.py", line 727, in main
    Y, covariates, error_cov, Y_ids = read_magma(config_dict["magma_prefix"],
                                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/oak/stanford/scg/lab_flongo/PREDICT_HD/PoPS/pops/pops.py", line 106, in read_magma
    sigmas, gene_metadata = munge_magma_covariance_metadata(magma_prefix + ".genes.raw")
                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/oak/stanford/scg/lab_flongo/PREDICT_HD/PoPS/pops/pops.py", line 153, in munge_magma_covariance_metadata
    curr_sigma[curr_ind, curr_ind - gene_corrs.shape[0]:curr_ind] = gene_corrs
    ~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ValueError: could not broadcast input array from shape (28,) into shape (0,)

It looks like there is an issue with the genes.raw file, but when comparing mine to the example, they have the same format and number of columns. It looks like that particular line is trying to take the tenth and beyond columns and build a covariance matrix? I thought I would see if it was a simple mistake someone else had come across.

Head of genes.raw (doesn't work on four different ones):

# VERSION = 110
# COVAR = NSAMP MAC
ENSG00000187634 1 861118 879961 98 20 929 101.745 -0.852099
ENSG00000188976 1 879584 894670 19 7 929 130.579 -1.2617 0.0570194
ENSG00000187961 1 895967 901095 126 29 929 131.659 -0.820827 0.0971049 0.0985293
ENSG00000187583 1 901877 910488 17 9 929 181.235 0.208456 0.256158 0.115332 0.103815
ENSG00000187642 1 910584 917473 19 5 929 247.474 -0.570091 0.105407 0.0691051 0.0827898 0.192258
ENSG00000188290 1 934342 935552 9 4 929 207.556 0.567864 0.101056 0.0721552 0.0978206 0.117395 0.410979
ENSG00000187608 1 936518 949921 61 17 929 219.164 0.531577 0.218639 0.0599179 0.160689 0.121905 0.371993 0.563819
ENSG00000188157 1 955503 991498 149 33 929 157.403 0.667515 0.317578 0.210609 0.587742 0.233102 0.0996136 0.107593 0.22348

python version 3.11.1