ChristopherWilks / snaptron

fast webservices based query tool for large sets of genomic features
Other
25 stars 7 forks source link

Query on the difference between snaptron and snaptron - experiments in terms of PSI calculation #20

Closed yuzhong1997 closed 4 months ago

yuzhong1997 commented 4 months ago

To whom it may concern,

Hi! I'm trying to run PSI estimation based on GTEx v8 as well as TCGA from your resources. I saw an example templete from your another git named "snaptron - experiments" that has PSI run. but it seems like there is a symbolic link to this git. so I got the following question:

  1. I initially referred to the ASCOT paper that ran PSI calculation referring to query_snaptron.py under "snaptron-experiments /client/". Back to time it was built, I guess they were based on recount2 pipline. I saw your commit history saying PSI calculation of your script mimics the way of how ASCOT calculate PSI. Are there any differences or just is it just wrapper function doing the same thing as ASCOT?

  2. If I wanna implement query_snaptron.py under "snaptron-experiments/client/" to calculate PSI given recount3 compilation, how am i supposed to indicate data sources? Your clsnapconf.py under "snaptron-experiments/client" does not include any new name altas indicating resources using recount3 compilation. i.e. gtexv2 or tcgav2. I'm not sure it would cause errors or not if I just edit / add those name altas into clsnapconf.py.

I'll appreciate if any helps here. BTW, those are vaualable resoures for the community. Thanks.

Yu

ChristopherWilks commented 4 months ago

Thanks for the feedback @yuzhong1997 !

I've made a couple of updates (one fixes the "HTTP Error 308: Permanent Redirect" errors), the other adds recount3-based data source strings to the client/clsnapconf.py:

DS_SRAV1_MOUSE='srav1m'
DS_SRAV3='srav3h'
DS_GTEXV2='gtexv2'
DS_TCGAV2='tcgav2'

In any case, those are just a guide, you can always put the actual data source string in (even if it's not in that file) via the command line, e.g.:

python client/query_snaptron.py --query-file data/test_psi_snapcount.snap.tsv --function psi --datasrc gtexv2 > gtexv2.psi
ChristopherWilks commented 4 months ago

As far as your original question (1), it's been a long while since I looked at the PSI stuff, but I think you're correct as far as it applies to the at least the single alternative cassette exon case.

Note, the PSI implementation is very basic---it assumes only 3 groups (https://github.com/ChristopherWilks/snaptron-experiments/blob/55dcaf1fc064c23dbc6667825a6e8874e0c5bef0/client/clsnapfunc.py#L224):

inclusion1 (alt. exon's left jxn)
inclusion2 (alt. exon's right jxn)
exclusion (jxn which excludes exon)

You could theoretically add additional jxns into one or more of those 3 categories/groups to get more sophisticated PSI's, but it's not something I've tested extensively, so use at your own risk.

yuzhong1997 commented 4 months ago

Thanks man. crystal clear.