ChristopherWilks / snaptron

fast webservices based query tool for large sets of genomic features
Other
25 stars 7 forks source link

Create a snaptron db with custom data #19

Closed pabloacera closed 8 months ago

pabloacera commented 9 months ago

Hi, I was wondering if there are available scripts to create a snaptron database with a custom dataset. e.g. lets say I have 100 different RNA-seq and I would like to process and query in a similar way snaptron has done it with GTEx. Is there a pipeline/script to do this? Thanks so much in advance for your time! Pablo.

ChristopherWilks commented 8 months ago

Hi @pabloacera,

Yes there is, though it's certainly no turn-key solution, everything is available publicly to run produce your own version of Snaptron (and recount3) on custom FASTQs! Have a look at this repo (including the README and David McGaughey's additional explanations), and please post any questions to its issues site.

https://github.com/langmead-lab/monorail-external

Chris

ChristopherWilks commented 8 months ago

I should also mention as an addendum, specifically the 1.1.1 version of recount-unify (Unifier) which aggregates the per-sample outputs of recount-pump, is what produces the Snaptron backing files (junctions.bgz, junctions.bgz.tbi, junctions.sqlite, samples.tsv, lucene_full_standard, lucene_full_ws, lucene_indexed_numeric_types.tsv).

You'll still need to use the Snaptron Docker image to actually run Snaptron on those files which is a separate process (this repo).