fakedrtom / SVAFotate

MIT License
38 stars 2 forks source link

gnomad-SV v4 #20

Open prasundutta87 opened 7 months ago

prasundutta87 commented 7 months ago

Hi, Any plans to update the gnomad frequencies to v4 (https://gnomad.broadinstitute.org/news/2023-11-v4-structural-variants)?

Regards, Prasun

fakedrtom commented 7 months ago

I have generated a bed file using the new gnomAD SVs and attempted to upload here. It was usable for a period of time, but the file is large enough that it began to cause storage issues that I need to resolve. I am working on finding a better alternative to host this data and then I will more fully integrate the new gnomAD into SVAFotate. My apologies for the delay/problem.

prasundutta87 commented 7 months ago

Thanks a lot for this update @fakedrtom and thanks for looking for a solution!

prasundutta87 commented 3 months ago

Hi @fakedrtom ..thannks for providing the gnomadv4 SVs data here -https://zenodo.org/records/10734967

On checking the file, I saw that the IDs have v3 written in them, for example- gnomAD-SV_v3_DUP_chr1_01c2781c

I was wondering what was the reason for this and which gnomad file was used to create this SVSFotate SV file.

Regards, Prasun

fakedrtom commented 3 months ago

Those IDs are lifted directly from the gnomAD files and they largely kept the same IDs as previous versions. I suspect because v4 only has a small increase on the total number of genomes as v3 (76,215 v4 genomes vs 76,156 v3 genomes). The link above is for gnomADv4 only. I have updated the main SVAFotate core file to include gnomADv4.1 and that can be downloaded here. Please note that v4.1 still has the v3 ID names for gnomAD entries.

prasundutta87 commented 3 months ago

Oh great! That's very helpful. Are topmed SVs added as well?

fakedrtom commented 3 months ago

Yes! There should SVs from CCDG, gnomAD, 1000G, and TOPMed in this file.

prasundutta87 commented 3 months ago

Oh great! I spent some time to make one with everything, but it was with gnomad v4. Let me update the file with your new file.

cathaloruaidh commented 3 weeks ago

Hi,

In the BED files from both zenodo links above, the gnomAD EUR allele frequencies are all NA. Is this by design or an error?

Thanks, Cathal

fakedrtom commented 3 weeks ago

When gnomAD updated to v4, they created a new set of populations/ancestries. It would seem that at this time they dropped EUR, but now includes FIN and NFE (non-Finnish European) instead. This is reflected in the gnomAD browser which now lists European (Finnish) and European (non-Finnish). I thought about assigning NFE to EUR to be consistent with other datasets, but decided to leave things as they are in their respectively datasets so now EUR for gnomAD entries is listed as NA. If you want to use gnomAD European frequencies, I suggest you select for them with FIN or NFE or both.

cathaloruaidh commented 2 weeks ago

Great, I had thought that was the case, but just wanted to double-check. Thanks for clarifying!