frattalab / PAPA

PAPA (Pipeline-Alternative Polyadenylation) - Snakemake pipeline for analysis of APA from short-read RNA-seq data
GNU General Public License v3.0
1 stars 0 forks source link

Add synthetic test data than can package with repo #33

Closed SamBryce-Smith closed 1 year ago

SamBryce-Smith commented 1 year ago

I have subsetted BAMs locally but probably too big for the repo. Would make testing much quicker

SamBryce-Smith commented 1 year ago
  1. Recycle test GTFs from this script, ensuring to cover all new possible permutations (need to make a list of these)
    • Probably important to expand the ranges a fair bit so e.g. exons cover 100s bps, transcripts 100s
    • Will also need to make a fake PolyASite atlas file
    • Replace some of the sequences w/o an atlas site with a PAS motif in the last n nts?
  2. Generate a random sequence (& FASTA) for the two regions e.g. using random.choice('ATCG') and then BioPython to output a FASTA
  3. Decide on relative expressions of each transcript ('fold changes') - make sure 1 of each event type is differentially used?
  4. Use polyester to simulate reads from these transcripts. Don't need anything fancy with the actual data, just enough
SamBryce-Smith commented 1 year ago

Completed with 93928632c13e10295f2a8525801108588d344ce4. May come back and add more permutations but this should cover the barebones