dieterich-lab / nmd-wf

MIT License
0 stars 0 forks source link

ORFanage #2

Closed tbrittoborges closed 6 months ago

tbrittoborges commented 1 year ago

I am running ORFanage for a few selected genes (FUCA2 RSF2 BAG3 ZFAS1):

Subsets the CCDS source for these genes

grep -e FUCA2 -e SRSF2 -e BAG3 -e ZFAS1 /biodb/genomes/homo_sapiens/GRCh38_102/GRCh38.102.SIRV.gtf > ref.gtf

Subsets the novel tx the genes:

grep -e MSTRG.46702 -e MSTRG.23821 -e MSTRG.7239 -e MSTRG.33012 ../phaseFinal/stringtie_merge/merged_each.fix.gtf > query.gtf

intersectBed -a ../riboseq_orfs.gtf -b query.gtf -wa -s > riboseq_orfs.gtf
intersectBed -a ../human-openprot.gtf -b query.gtf -wa -s > human-openprot.gtf

Running orfanage (from /prj/Niels_Gehring/nmd_transcriptome/orfanage):

sbatch -c20 --mem 64GB --wrap="~/repos/ORFanage/orfanage --non_aug --query query.gtf --stats all.stats --mode ALL --output all.nopi.gtf --threads 20 --reference /biodb/genomes/homo_sapiens/GRCh38_102/GRCh38_102_SIRVomeERCCome_oneCol.fa ref.gtf human-openprot.gtf riboseq_orfs.gtf"

ORFanage v1.2.0 (installed with conda).

tbrittoborges commented 1 year ago

Uploading results and stats of the run:

orfanage_results_ZFAS1_SRSF2_BAG3_FUCA2.zip

It seems the ZFAS1 ORF on the first exon is only captured by --mode ALL.

--mode

Possible values are: START_MATCH - Selects the ORF candidate which matches the reference START codon. LONGEST_MATCH - Selects the ORF candidate, which maximizes the number of positions shared between reference and query in the same frame. If alignment mode is enabled via --pi this mode will be superceeded by the number of aligned positions instead. BEST - Default. Selects the ORF candidate, which maximizes the ILPI between reference and query. If alignment mode is enabled via --pi this mode will be superceeded by highest % Identity instead. LONGEST - Selects the longest ORF candidate. ALL - reports all available ORF candidates.