Cloufield / gwaslab

A Python package for handling and visualizing GWAS summary statistics. https://cloufield.github.io/gwaslab/
GNU General Public License v3.0
151 stars 25 forks source link

SNPID takes value from RSID and this is the cause of many styling errors #5

Open Delvalle-beep opened 1 year ago

Delvalle-beep commented 1 year ago

I'm not having problems, but I would like to make a post to help people who are suffering from the same problem. I was having a problem with setting the "highlight" argument in my plots because it asked to set a seaborn "hue" argument. I tried everything but couldn't fix it. Until I realized that the SNPID in gwaslab takes the RSID value, and this was the cause of the error. For the layman, the SNPID has a format like chr:pos:ea:neaf (or something like that), but it always has a value like "10:94263:A:C", while the RSID has a format like " rs1234", always starting with the initials "rs". Anyway, I hope this helped you.

Cloufield commented 1 year ago

Hi Regina, Thanks for your helpful comment! I just want to clarify that for mqq plot, gwaslab will use SNPID first if it is available in the sumstats. If SNPID is not available, rsID will be used then. The logic is that SNPIDs are mostly unique but rsIDs are often duplicated.

And you can use rsidor snpid to specify which column to load for rsIDs or SNPIDs in gl.Sumstats(). (For loading with predefined formats, sometimes rsid will be loaded as snpid. But you can always manually specify rsidor snpid to correct this)

Delvalle-beep commented 1 year ago

In this case, I am generating my SNPID in the chr.pos:ea:nea format before passing it to sumstats through data manipulation using pandas, because in my study the SNPID did not come automatically. However, I'm having trouble generating the plot correctly, as well as having trouble using 'random_variants()' to generate variables for my study.