gkichaev / PAINTOR_V3.0

Fast, integrative fine mapping with functional data
55 stars 21 forks source link

CalcLD script flips Z statistic but not the allele designation #17

Closed ghost closed 6 years ago

ghost commented 6 years ago

Hello Gleb,

As mentioned in reference to another issue, I am making some modifications to the CalcLD PAINTOR companion script.

In that script, in the Match_Ref_Panel_Alleles() function, there is a block of code that flips the Z-statistic provided that the A1 designation from the 1000 genomes VCF does not match the effect allele in the study data:

                elif vcf_A1 == data_A0 and vcf_A0 == data_A1:
                        out_vcf.append(rows)
                        flip_line = input_data[curr_snp+1]
                        zflip = -1*float(flip_line[z_index])
                        flip_line[z_index] = str(zflip)
                        final_locus.append(flip_line)

I expected to find functionality like this when I began editing the script, but I also thought it would contain a line that would flip the effect and alternate allele designations themselves and then make the direction of the Z-statistic fit to that rather than flipping the Z-stat only.

Perhaps I am not thinking about this correctly, but if you were to flip the Z-stat only and then generate the LD matrix, THOSE two things would correspond correctly. However, it seems to me that if only the Z-statistic is flipped, then algorithm will still function properly, but the results that people report will miss the direction of effect of the variant reported provided it was flipped by this script. Do you agree with this assessment?

VL

gkichaev commented 6 years ago

Hi Vincent,

I think you're right. The function only flips the Z-score and not the resulting A1/A0 designation in the output file. This would probably be useful to add in.

--Gleb

ghost commented 6 years ago

@gkichaev Thanks very much for responding so quickly.

The following lines of code can be added to emend the script as written if you like.

Before:

        if vcf_A1 == data_A1 and vcf_A0 == data_A0:`
            out_vcf.append(rows)
            flip_line = input_data[curr_snp+1]
            zflip = -1*float(flip_line[z_index])
            flip_line[z_index] = str(zflip)
            final_locus.append(flip_line)

After:

        if vcf_A1 == data_A1 and vcf_A0 == data_A0:
               out_vcf.append(rows)
               flip_line = input_data[curr_snp+1]
               EA=flip_line[3]
               AA=flip_line[4]
               flip_line[3]=AA
               flip_line[4]=EA
               zflip = -1*float(flip_line[z_index])
               flip_line[z_index] = str(zflip)
               final_locus.append(flip_line)

I am working on a version that does this as well as other things like run the data for AT/GC SNPs and run the algorithm for all populations in one run. Will send that along when its finished.

pwh124 commented 5 years ago

This probably goes without saying for most people but...

If you implement the code above for flipping the alleles, the columns must be in the same order as they appear in the wiki, namely: chr bp rsid Effect Alt Zscore

Any number of columns can follow after that