DyogenIBENS / FINSURF

FINSURF is a tool designed to analyse lists of sequences variants in the human genome.
https://www.finsurf.bio.ens.psl.eu/
Other
11 stars 2 forks source link

Build #1

Open mfazel opened 2 years ago

mfazel commented 2 years ago

Hi,

I'm trying to run FINSURF on a list of SNPs, but have a couple of questions. 1- I could not find anywhere in the github repo what build I should use to supply my variants. My guess was it's hg19, but wanted to hear from you since it works based on position and not rsID. 2- I ran 1000 SNPs on the website but for more than half of them I did not get any results back, what does it mean or what could be wrong? 3- What is the use of Ref/Alt alleles in the calculation?

Thanks, Mehdi

lambosaur commented 2 years ago

Hi Mehdi,

1- Yes this is indeed hg19 coordinates. 2- The total number of genomic positions available with FINSURF scores is about ~400M, aggregated from resources corresponding to putative regulatory regions with predicted associations to genes. Provided your 1,000 SNPs are formatted correctly, the absence of results suggests they are not found within this 400M "regulatory genome" we defined for FINSURF. 3- the alleles are used to annotate the mutations as transition, transversion, or indel, and this status is considered by the model for the scoring. We pre-calculated scores for transitions and transversions (scores for indels are retrieved from transversions scores), so when inputing a mutation with its REF and ALT, the webserver interrogates the relevant resource of precomputed scores.

Best,

Lambert

mfazel commented 2 years ago

Hi Lambert, thanks a lot for clarification. I suggest you to include this in the github readme file and also on the website. (probably in the example header instead of "pos" "pos_hg19" would be the easiest!). I also suggest you to replace your examples, in github and website, with a more clear one that includes rsIDs or at least a mix, however I believe the ID is just for the purpose of matching the results and is not being used in the analysis at all. I also was wondering for example, why the following example which is a very common SNP does not exist among your 400M SNP database or what am I doing wrong? chr1 1156655 rs75998592 G A I get "Your variants yeild no result" after submitting to finsurf.

Best, Mehdi