eturro / mmseq

Haplotype, isoform and gene level expression analysis using multi-mapping RNA-seq reads
GNU General Public License v2.0
67 stars 20 forks source link

Aggregate transcripts into groups by UTR #50

Closed taigalokhid closed 3 years ago

taigalokhid commented 3 years ago

Hi Ernest,

I'm trying to aggregate transcripts by some features with mmseq. How to aggregate transcripts by common 5' UTR according to the "Other aggregations (e.g. of transcripts sharing the same UTRs) are also possible." https://github.com/eturro/mmseq

Best regards, Elena

eturro commented 3 years ago

Hi Elena,

You need to modify the header in the hits file.

If you run hitstools inspect hits_file you'll see that the hits file header contains a block of lines beginning with @.***" E.g.:

@GeneIsoforms ENSG00000000003 ENST00000494424 ENST00000373020 ENST00000496771

This tells mmseq that the aggregated expression estimate for "ENSG00000000003" should be obtained by summing over ENST00000494424, ENST00000373020 and ENST00000496771.

If you modify the @GeneIsoforms block in the header so that you have additional lines each with an arbitrary ID corresponding to a 5'UTR followed by the IDs of all the transcripts sharing that UTR, you should be in business.

Let's assume you have the new @GeneIsoforms block in a file called "new_GeneIsoforms_block". You could run something like this to generate a new hits file:

hitstools header hits_file | grep TranscriptMetaData > new_hits_file_t cat new_GeneIsoforms_block >> new_hits_file_t hitstools header fits_file | grep IdenticalTranscripts >> new_hits_file_t hitstools t hits_file | grep -v @ >> new_hits_file_t hitstools b new_hits_file_t > new_hits_file

After this you may run mmseq.

best wishes, Ernest

On 6 Nov 2021, at 11:56, taigalokhid @.***> wrote:

Hi Ernest,

I'm trying to aggregate transcripts by some features with mmseq. How to aggregate transcripts by common 5' UTR according to the "Other aggregations (e.g. of transcripts sharing the same UTRs) are also possible." https://github.com/eturro/mmseq https://github.com/eturro/mmseq Best regards, Elena

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/eturro/mmseq/issues/50, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABBTMEFIHBEU3IHQQTKGHS3UKVF3TANCNFSM5HPXVTLQ. Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

taigalokhid commented 3 years ago

Thanks a lot. I try it

Best regards, Elena