Closed Fedja closed 3 years ago
It fails in the part where we join the credible variants to the summary stat dataframe, which happens before we group the data.
I tested locally and the bug seems to be that concatenating an empty dataframe to non-empty dataframe does not work if there are duplicated columns (FINNGEN_AF.Controls is duplicated because there's no generic FG AF in the data)
What I think needs changing:
yep do 1. for sure. 2. In meta I put these types of extra columns in --extra-cols which takes comma separated list of cols to spit out
Relevant PRs: #111 #112
As of 9ad5bc5, I did successfully run this after removing the duplicate columns in the call. Can you try on eu.gcr.io/finngen-refinery-dev/autorep:9ad5bc5 and check if it works? The wdl and json files for docker should be up to date. Specifically, I ran:
main.py RHEUMA_SEROPOS_meta_out.gz \
--sign-treshold 5e-08 --alt-sign-treshold 0.01 \
--group --grouping-method ld --locus-width-kb 1500 \
--ld-panel-path wgs_all --ld-r2 0.4 --plink-memory 21000 \
--ld-api online --include-batch-freq --finngen-path R4_annotated_variants_v1.gz \
--functional-path fin_enriched_genomes_select_columns.txt.gz \
--gnomad-genome-path gnomad.genomes.r2.1.sites.liftover.b38.finngen.r2pos.af.ac.an.tsv.gz\
--gnomad-exome-path gnomad.exomes.r2.1.sites.liftover.b38.finngen.r2pos.af.ac.an.tsv.gz \
--finngen-annotation-version r4 --use-gwascatalog --ld-treshold 0.7 --ldstore-threads 4 \
--gwascatalog-threads 8 --strict-group-r2 0.5 --gwascatalog-pval 5e-08 \
--gwascatalog-width-kb 25 --db gwas --column-labels "#CHR" POS REF ALT all_inv_var_meta_p \
--extra-cols all_inv_var_meta_beta FINNGEN_AF.Controls FINNGEN_AF.Cases \
--efo-codes EFO_1001999 EFO_0002609 EFO_0000685 --ignore-region 6:23000000-38000000 \
--fetch-out RHEUMA_SEROPOS.fetch.out --annotate-out RHEUMA_SEROPOS.annotate.out \
--report-out RHEUMA_SEROPOS.report.out --top-report-out RHEUMA_SEROPOS.top.out \
--ld-report-out RHEUMA_SEROPOS.ld.out
Using the latest docker in container registry.... eu.gcr.io/finngen-refinery-dev/autorep:4d40b2b
main.py /cromwell_root/finngen_commons/est_ukb_meta/RHEUMA_SEROPOS_meta_out.gz --sign-treshold 5e-08 --alt-sign-treshold 0.01 --group --grouping-method ld --locus-width-kb 1500 --ld-panel-path /cromwell_root/finngen-imputation-panel/sisu3/wgs_all --ld-r2 0.4 --plink-memory 17000 --include-batch-freq --finngen-path /cromwell_root/r4_data_west1/annotations/R4_annotated_variants_v1.gz --functional-path /cromwell_root/r4_data_west1/gnomad_functional_variants/fin_enriched_genomes_select_columns.txt.gz --gnomad-genome-path /cromwell_root/fg-datateam-analysisteam-share/gnomad/2.1/genomes/gnomad.genomes.r2.1.sites.liftover.b38.finngen.r2pos.af.ac.an.tsv.gz --gnomad-exome-path /cromwell_root/fg-datateam-analysisteam-share/gnomad/2.1/exomes/gnomad.exomes.r2.1.sites.liftover.b38.finngen.r2pos.af.ac.an.tsv.gz --finngen-annotation-version r4 --use-gwascatalog --ld-treshold 0.7 --ldstore-threads 4 --gwascatalog-threads 8 --strict-group-r2 0.5 --gwascatalog-pval 5e-08 --gwascatalog-width-kb 25 --db local --column-labels #CHR POS REF ALT all_inv_var_meta_p all_inv_var_meta_beta FINNGEN_AF.Controls FINNGEN_AF.Cases FINNGEN_AF.Controls --local-gwascatalog /cromwell_root/r4_data_west1/autoreporting/gwas-catalog-associations_ontology-annotated-191007.tsv --efo-codes EFO_1001999 EFO_0002609 EFO_0000685 --ignore-region 6:23000000-38000000 --custom-dataresource /cromwell_root/r4_data_west1/autoreporting/custom_dataresource_r4_2020_03_25.tsv --fetch-out RHEUMA_SEROPOS.fetch.out --annotate-out RHEUMA_SEROPOS.annotate.out --report-out RHEUMA_SEROPOS.report.out --top-report-out RHEUMA_SEROPOS.top.out --ld-report-out RHEUMA_SEROPOS.ld.out --- phenotype RHEUMA_SEROPOS RETURN CODE: 1 --- --- phenotype RHEUMA_SEROPOS STDOUT --- input file: /cromwell_root/finngen_commons/est_ukb_meta/RHEUMA_SEROPOS_meta_out.gz filter & group SNPs Traceback (most recent call last): File "/usr/local/bin/main.py", line 135, in
main(args)
File "/usr/local/bin/main.py", line 46, in main
ignore_region=args.ignore_region, cred_set_file=args.cred_set_file,ld_api=ld_api)
File "/usr/local/bin/gws_fetch.py", line 231, in fetch_gws
temp_df = merge_credset(temp_df,cs_df,gws_fpath,columns)
File "/usr/local/bin/gws_fetch.py", line 191, in merge_credset
df = pd.concat( [gws_df,cred_row_df], axis="index", ignore_index=True, sort=False).drop_duplicates(subset=list( join_cols ) )
File "/usr/local/lib/python3.6/dist-packages/pandas/core/reshape/concat.py", line 258, in concat
return op.get_result()
File "/usr/local/lib/python3.6/dist-packages/pandas/core/reshape/concat.py", line 473, in get_result
mgrs_indexers, self.new_axes, concat_axis=self.axis, copy=self.copy
File "/usr/local/lib/python3.6/dist-packages/pandas/core/internals/managers.py", line 2059, in concatenate_block_managers
return BlockManager(blocks, axes)
File "/usr/local/lib/python3.6/dist-packages/pandas/core/internals/managers.py", line 143, in init
self._verify_integrity()
File "/usr/local/lib/python3.6/dist-packages/pandas/core/internals/managers.py", line 350, in _verify_integrity
"tot_items: {1}".format(len(self.items), tot_items)
AssertionError: Number of manager items must equal union of block items
manager items: 8, # tot_items: 9