comprna / SUPPA

SUPPA: Fast quantification of splicing and differential splicing
MIT License
254 stars 59 forks source link

Using SUPPA2 with PacBio sequencing #129

Open aa9gj opened 3 years ago

aa9gj commented 3 years ago

Hi, thanks in advance for taking the time to answer this question. What are the constraints of using SUPPA with PacBio sequencing data if any? Thanks again

EduEyras commented 3 years ago

Hi,

In principle there should be no constraints. There are multiple ways in which you could use pacbio data:

1) map pacbio reads to the genome and use the mappings to reconstructs transcripts and calculated abundances. Then get the GTF for those reconstructed transcripts and run SUPPA. Calculate PSIs using the transcript abundances...

2) map pacbio reads to genome or transcriptome and use the mappings to assign abundances to already annotated transcripts. Run SUPPA on the annotation GTF, and use the calculated abundances to calculate PSIs, ...

One potential issue with 1) is that PacBio reads may give you lots of novel transcripts. I would not trust those unless they have enough supporting reads.

If you're working with a species with good annotation, 2) could be effective and you can rely on the annotation for the events. Otherwise, it may be better to try 1), considering the potential issue I mentioned.

Either way, it is feasible and will mostly depend on having enough reads for the quantification, so that the PSI estimates are reliable.

I hope this helps

Eduardo

On Tue, 3 Aug 2021 at 05:33, Arby Abood @.***> wrote:

Hi, thanks in advance for taking the time to answer this question. What are the constraints of using SUPPA with PacBio sequencing data if any? Thanks again

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/comprna/SUPPA/issues/129, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADCZKB4Q7YALXF33EV22HHTT23XJ3ANCNFSM5BNL2Y4Q . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&utm_campaign=notification-email .

-- Prof. E Eyras EMBL Australia Group Leader The John Curtin School of Medical Research - Australian National University https://github.com/comprna http://scholar.google.com/citations?user=LiojlGoAAAAJ

splicingrats commented 2 years ago

Hi Eduardo,

I tried your comments from Issue#121, and didn't worked so well.

https://github.com/comprna/SUPPA/issues/121

Here's what I did:

PacBio reads --> map to the genome --> BAM --> StringTie2/Sqanti3 --> GTF

For generation of events, I tried two options:

  1. I used the GTF above to generate events with generateEvents (with pools-genes option)
  2. I used an annotation GTF (human) to generateEvents.

I was stuck and the next step of psiPerEvent, since I am not sure the input of expression file. Someone said that you can input the expression file from short-read sequencing for this step with ioe files generated from my GTF, some said the input file could be calculated abundance file. I failed with both options.

code for the first option: suppa.py psiPerEvent -i events.ioe -e /home/shared.../_formatted.txt (from short read sequencing) -o

errors for the first option: ... ERROR:psiCalculator:Duplicated event STRG.9601;SE:GL383574.1:63402-67133:67315-67448:+. Skipping line... ERROR:psiCalculator:Duplicated event STRG.9601;SE:GL383574.1:69480-74152:74208-75109:+. Skipping line... ERROR:psiCalculator:Duplicated event STRG.9611;SE:GL383519.1:65682-66680:66757-67109:-. Skipping line... ERROR:psiCalculator:Duplicated event STRG.9616;SE:KQ458383.1:108587-110831:110999-114190:-. Skipping line... ERROR:psiCalculator:Duplicated event STRG.9644;SE:KI270897.1:296093-301862:301915-302360:+. Skipping line... INFO:lib.tools:File /home/shared/24T/.../events.ioe closed.

I guess the problem was from the generation of GTF with stringtie.

For the 2nd option, I tried the abundance file from cDNA_Cupcake, and failed, again.

The problem for the GTF files is that they contain Pacbio ids (PB1.1,PB2.1) or stringtie specific ids, should I remove these ids when I want to generate the events? If so, how? Or is there a way to generate GTF with these ids? On the other hand, is there another way to generate abundance file for the Pacbio long read sequencing files?

Thank you

Best, Yanshan

EduEyras commented 2 years ago

Dear Yanshan,

thanks for your email. And thanks a lot for testing the various options. I think you're operating correctly, but the problem may be in the transcript IDs.

The transcript IDs in the GTF must coincide with the transcript IDs in the abundance file. That's the only way for SUPPA to link the events with the abundances to calculate the PSIs.

Since each method (StringTie2, Cupcake, ....) may do something different with the IDs, it could be that you have to do something to the IDs (after quantification or after GTF generation) to make them match.

Regarding how you quantity abundances in transcripts, there are various options. You could also map directly to the transcripts and use Salmon (either minimap2 + salmon) or (salmon with their own mapping).

Your error indicates a duplicated even. This is strange, as one would not expect such duplications. It could be that some ID contains characters that have specific meaning in the GTF, e.g. ";", """, .... and this is causing trouble?

Please let me know if any of this makes it work.

best

Eduardo

On Thu, 4 Nov 2021 at 00:21, splicingrats @.***> wrote:

Hi Eduardo,

I tried your comments from Issue#121, and didn't worked so well.

121 https://github.com/comprna/SUPPA/issues/121

Here's what I did:

PacBio reads --> map to the genome --> BAM --> StringTie2/Sqanti3 --> GTF

For generation of events, I tried two options:

  1. I used the GTF above to generate events with generateEvents (with pools-genes option)
  2. I used an annotation GTF (human) to generateEvents.

I was stuck and the next step of psiPerEvent, since I am not sure the input of expression file. Someone said that you can input the expression file from short-read sequencing for this step with ioe files generated from my GTF, some said the input file could be calculated abundance file. I failed with both options.

code for the first option: suppa.py psiPerEvent -i events.ioe -e /home/shared.../_formatted.txt (from short read sequencing) -o

errors for the first option: ... ERROR:psiCalculator:Duplicated event STRG.9601;SE:GL383574.1:63402-67133:67315-67448:+. Skipping line... ERROR:psiCalculator:Duplicated event STRG.9601;SE:GL383574.1:69480-74152:74208-75109:+. Skipping line... ERROR:psiCalculator:Duplicated event STRG.9611;SE:GL383519.1:65682-66680:66757-67109:-. Skipping line... ERROR:psiCalculator:Duplicated event STRG.9616;SE:KQ458383.1:108587-110831:110999-114190:-. Skipping line... ERROR:psiCalculator:Duplicated event STRG.9644;SE:KI270897.1:296093-301862:301915-302360:+. Skipping line... INFO:lib.tools:File /home/shared/24T/.../events.ioe closed.

I guess the problem was from the generation of GTF with stringtie.

For the 2nd option, I tried the abundance file from cDNA_Cupcake, and failed, again.

The problem for the GTF files is that they contain Pacbio ids (PB1.1,PB2.1) or stringtie specific ids, should I remove these ids when I want to generate the events? If so, how? Or is there a way to generate GTF with these ids? On the other hand, is there another way to generate abundance file for the Pacbio long read sequencing files?

Thank you

Best, Yanshan

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/comprna/SUPPA/issues/129#issuecomment-959060352, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADCZKBYLR3OJQY263C4DGF3UKEZNPANCNFSM5BNL2Y4Q . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

-- Prof. E Eyras EMBL Australia Group Leader The John Curtin School of Medical Research - Australian National University https://github.com/comprna http://scholar.google.com/citations?user=LiojlGoAAAAJ

splicingrats commented 2 years ago

Dear Eduardo,

thank you for the quick reply. That really helps.

I tried to use salmon with their own mapping to generate abundance files, and used ioe file from annotated GTF as index. I passed the psiperEvent, and made to the final diffSplice part, where I got another error.

suppa.py diffSplice -m empirical -gc -i /home/shared//gencode.v38.all.events.ioe -p /_events.psi /_events.psi -e /iso_tpm_formatted.txt /iso_tpm_formatted.txt -o v*

Calculating differential analysis between conditions: _events and _events ERROR:main:Unknown error: (<class 'IndexError'>, IndexError('list index out of range'), <traceback object at 0x7f1058c8f880>)

I guess the problem is that I used one sample for each condition, am I right? If so, is there a way for suppa to handle samples without replicates? If not, please let me know where the problem is.

Thank you so much.

Best, Yanshan

EduEyras commented 2 years ago

Hi Yanshan,

yep, supp diffSplice needs at least 2 samples per replicate to the test.

Do you have reps to this?

If you only have one rep per condition, there are tests you could do. E.g. you could use deltaPSI or |deltaPSI| to rank events and test an association with gene-sets, pathways, etc... similarly to when you do a gene-set enrichment analysis.

E.

On Thu, 4 Nov 2021 at 20:13, splicingrats @.***> wrote:

Dear Eduardo,

thank you for the quick reply. That really helps.

I tried to use salmon with their own mapping to generate abundance files, and used ioe file from annotated GTF as index. I passed the psiperEvent, and made to the final diffSplice part, where I got another error.

suppa.py diffSplice -m empirical -gc -i /home/shared//gencode.v38.all.events.ioe -p /_events.psi /_events.psi -e /iso_tpm_formatted.txt /iso_tpm_formatted.txt -o v*

Calculating differential analysis between conditions: _events and _events ERROR:main:Unknown error: (<class 'IndexError'>, IndexError('list index out of range'), <traceback object at 0x7f1058c8f880>)

I guess the problem is that I used one sample for each condition, am I right? If so, is there a way for suppa to handle samples without replicates? If not, please let me know where the problem is.

Thank you so much.

Best, Yanshan

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/comprna/SUPPA/issues/129#issuecomment-960576473, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADCZKB7ID5CIXFBJ72ZIDM3UKJFCPANCNFSM5BNL2Y4Q . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

-- Prof. E Eyras EMBL Australia Group Leader The John Curtin School of Medical Research - Australian National University https://github.com/comprna http://scholar.google.com/citations?user=LiojlGoAAAAJ

splicingrats commented 2 years ago

Dear Eduardo,

thank you for your quick and thorough response. That helps a lot.

Best, Yanshan


发件人: Eduardo Eyras @.> 发送时间: 2021年11月4日 19:27 收件人: comprna/SUPPA @.> 抄送: splicingrats @.>; Comment @.> 主题: Re: [comprna/SUPPA] Using SUPPA2 with PacBio sequencing (#129)

Hi Yanshan,

yep, supp diffSplice needs at least 2 samples per replicate to the test.

Do you have reps to this?

If you only have one rep per condition, there are tests you could do. E.g. you could use deltaPSI or |deltaPSI| to rank events and test an association with gene-sets, pathways, etc... similarly to when you do a gene-set enrichment analysis.

E.

On Thu, 4 Nov 2021 at 20:13, splicingrats @.***> wrote:

Dear Eduardo,

thank you for the quick reply. That really helps.

I tried to use salmon with their own mapping to generate abundance files, and used ioe file from annotated GTF as index. I passed the psiperEvent, and made to the final diffSplice part, where I got another error.

suppa.py diffSplice -m empirical -gc -i /home/shared//gencode.v38.all.events.ioe -p /_events.psi /_events.psi -e /iso_tpm_formatted.txt /iso_tpm_formatted.txt -o v*

Calculating differential analysis between conditions: _events and _events ERROR:main:Unknown error: (<class 'IndexError'>, IndexError('list index out of range'), <traceback object at 0x7f1058c8f880>)

I guess the problem is that I used one sample for each condition, am I right? If so, is there a way for suppa to handle samples without replicates? If not, please let me know where the problem is.

Thank you so much.

Best, Yanshan

― You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/comprna/SUPPA/issues/129#issuecomment-960576473, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADCZKB7ID5CIXFBJ72ZIDM3UKJFCPANCNFSM5BNL2Y4Q . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

-- Prof. E Eyras EMBL Australia Group Leader The John Curtin School of Medical Research - Australian National University https://github.com/comprna http://scholar.google.com/citations?user=LiojlGoAAAAJ

― You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/comprna/SUPPA/issues/129#issuecomment-960674352, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ATGTUDNRSN5EW7CCRVGL3TDUKJUZRANCNFSM5BNL2Y4Q. Triage notifications on the go with GitHub Mobile for iOShttps://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Androidhttps://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.