Closed Bobsimonoff closed 3 months ago
Hello @Bobsimonoff,
Thanks for your query and providing feedback on GWAS plugin!
Yes, currently the plugin filters out variants that does not have variant accession ids. In future, we can look into annotating other variant entries that does not have rs ids but at least have risk allele.
Nonetheless, the warning message seems to be many and it does not looks right. I will add a quiet
option to optionally turn off the warning. In the meantime, you can just comment out the line that is giving the warning.
Best regards, Nakib
Thanks will watch for the update, but if performance can't be improved dramatically, I may just do the annotation in python using multithreading since I can process the entire GWAS in about 1-2 hours that way.
Hello @Bobsimonoff,
We have added a option verbose
to the plugin. If verbose=1
you will see the warning messages otherwise not, as the plugin can be quite noisy.
This update will be available in the next Ensembl release 112. I will close this issue. If you face further problem feel free to open a new one.
Best regards, Nakib
I think the regex in the plugin assumes the values in the SNPS column are of the form: rs[0-9]+
However not all are. By running: cut -d $'\t' -f 22 gwas_catalog_v1.0.2-associations_e110_r2023-12-20.tsv| grep -v -E 'rs[0-9]+'
We see a wide variety of values in this column, including ones that follow these formats to list a few: chr12:59581708 chr19:19393890:I chr19:19393890:D exm474728 kgp21281797 HLA-A*02:01 X:146986184:A_AAA 7:120812727_G_C
This results in almost 40k warnings on the current version of the file of the following form: WARNING: Could not parse any rsIds from string 'chrX:66510909'
I am not sure if anything can be done plugin wise about this, but in case there is I thought I'd report it