Closed AndrewLangvt closed 6 years ago
Hi Andy,
sorry for the delay in replying. I usually get emails telling me there is an issue, but I did not get this one.
The -p should redefine genes, but that should not affect the link between events<-->isoforms and between isoforms <--> TPM values.
Your error says that for some of the XM_... transcripts you don't have data in the TPM file. We see this generally when we have TPM data for only some chromosomes (e.g. in human we removed unassembled chromosomes or haplotypes), or for only some genes (if we e.g. removed pseudogenes). It could be that you're original GTF had more annotations than you have TPM values for.
Please let me know if this helps
Best
Eduardo
Thanks for the reply Eduardo. Is there any way to have SUPPA ignore those annotations for which there are no TPM values? Or do I have to parse through the annotations file and remove all of those annotations?
I'm afraid you'll have to remove them from the annotation file. The warning is there to let you know that there is a discrepancy between the annotation and the expression file. You can simply ignore de warnings, unless you're redirecting STDERR to log your runs in a cluster and you're running thousands of them and you don't want to end up filling up the disk. Then best will be to remove those cases from the annotation. I hope it works for you. Best Eduardo
On Wed, Nov 1, 2017 at 6:51 PM, AndrewLangvt notifications@github.com wrote:
Thanks for the reply Eduardo. Is there any way to have SUPPA ignore those annotations for which there are no TPM values? Or do I have to parse through the annotations file and remove all of those annotations?
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/comprna/SUPPA/issues/10#issuecomment-341185572, or mute the thread https://github.com/notifications/unsubscribe-auth/AMWVBw8GtjW4E3FmbYSWvdWn0CpLbEEtks5syK-IgaJpZM4QE_Up .
-- Dr E Eyras
ICREA Research Professor Universitat Pompeu Fabra PRBB, Dr Aiguader 88 Tel: +34 93 316 0502 (ext 1502) E08003 Barcelona, Spain Fax: +34 93 316 0550
http://scholar.google.com/citations?user=LiojlGoAAAAJ http://www.researcherid.com/rid/L-1053-2014 http://regulatorygenomics.upf.edu/
Will the analysis run unhindered by the warnings if I were to simply ignore them? I'm running AS analyses with a few different programs, so I would like to keep things as consistent as possible across all analyses. If removing the annotations from the GTF is the only way to successfully complete SUPPAs analysis, that would be the only course of action.
Oh I see. No, it won't influence the results. It's just a warning. The results should be ok for the cases where it found the data.
That's interesting. We've been also comparing various methods: rMATS, MAJIQ and DEXSeq. I'd be happy to share experiences. I'd be curious to know if you're testing others and what you're results show (if you can share it, of course).
best
Eduardo
On Wed, Nov 1, 2017 at 6:59 PM, AndrewLangvt notifications@github.com wrote:
Will the analysis run unhindered by the warnings if I were to simply ignore them? I'm running AS analyses with a few different programs, so I would like to keep things as consistent as possible across all analyses. If removing the annotations from the GTF is the only way to successfully complete SUPPAs analysis, that would be the only course of action.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/comprna/SUPPA/issues/10#issuecomment-341188300, or mute the thread https://github.com/notifications/unsubscribe-auth/AMWVBx3aZBCQqT4Y4278m9kpG8A16k19ks5syLGJgaJpZM4QE_Up .
-- Dr E Eyras
ICREA Research Professor Universitat Pompeu Fabra PRBB, Dr Aiguader 88 Tel: +34 93 316 0502 (ext 1502) E08003 Barcelona, Spain Fax: +34 93 316 0550
http://scholar.google.com/citations?user=LiojlGoAAAAJ http://www.researcherid.com/rid/L-1053-2014 http://regulatorygenomics.upf.edu/
Great! That will save me some time, and also maintain consistency. Thanks, again, for addressing this!
Yea, I'll keep you guys posted as things unfold a bit more. Currently I'm just running a subset of my data (36 rnaseq libraries, full dataset is 152) through SUPPA, JunctionSeq (heavily based on DEXSeq), and Whippet. Once I have a clearer picture of the agreement/disagreements between the three, I'd be happy to chat further and discuss!
Cheers
On Wed, Nov 1, 2017 at 7:12 PM, AndrewLangvt notifications@github.com wrote:
Great! That will save me some time, and also maintain consistency. Thanks, again, for addressing this!
Yea, I'll keep you guys posted as things unfold a bit more. Currently I'm just running a subset of my data (36 rnaseq libraries, full dataset is 152) through SUPPA, JunctionSeq (heavily based on DEXSeq), and Whippet. Once I have a clearer picture of the agreement/disagreements between the three, I'd be happy to chat further and discuss!
I'd be curious how to compare Whippet with SUPPA. When comparing with MAJIQ, we were selecting junctions, and using the junction with the highest posterior (the most confident) and assign it to the even (for SE events it would be the inclusion junction).
For JunctionSeq I'd imagine it would be something similar. For Whippet I don't know if it'll be simple to do.
best
E.
Cheers
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/comprna/SUPPA/issues/10#issuecomment-341192145, or mute the thread https://github.com/notifications/unsubscribe-auth/AMWVBzFUh4IBh88VF0VKQDpgN4h5Swamks5syLSOgaJpZM4QE_Up .
-- Dr E Eyras
ICREA Research Professor Universitat Pompeu Fabra PRBB, Dr Aiguader 88 Tel: +34 93 316 0502 (ext 1502) E08003 Barcelona, Spain Fax: +34 93 316 0550
http://scholar.google.com/citations?user=LiojlGoAAAAJ http://www.researcherid.com/rid/L-1053-2014 http://regulatorygenomics.upf.edu/
Hello @EduEyras can you please help me with this issue? it says it for all transcripts and gives NA in psi file ERROR:psiCalculator:transcript ENSMUSG00000051285 not found in the "expression file". ERROR:psiCalculator:PSI not calculated for event ENSMUSG00000094337;SE:GL456354.1:83560-84521:85111-85765:-.
here is my expression file in tpm
my ioe file
when I grep ENSMUSG00000051285 in expression file
I have check if there are hidden spaces in the tab file but it seems okay
Thank you for your help
Hi These are gene ids But you need to quantify the transcript ids Best E
On Wed, 15 Mar 2023 at 04:58, Olivier Feudjio @.***> wrote:
Hello @EduEyras https://github.com/EduEyras can you please help me with this issue? it says it for all transcripts and gives NA in psi file ERROR:psiCalculator:transcript ENSMUSG00000051285 not found in the "expression file". ERROR:psiCalculator:PSI not calculated for event ENSMUSG00000094337;SE:GL456354.1:83560-84521:85111-85765:-.
here is my expression file in tpm [image: image] https://user-images.githubusercontent.com/52349851/225257686-2bd15c7f-bc25-4ba9-aaba-29c69bb9d27f.png
my ioe file [image: image] https://user-images.githubusercontent.com/52349851/225257862-746d610b-54f7-4fd8-9951-c883a099da3f.png
when I grep ENSMUSG00000051285 in expression file [image: image] https://user-images.githubusercontent.com/52349851/225257992-d67966a0-4059-45f0-bf88-2c90438f32d0.png
I have check if there are hidden spaces in the tab file but it seems okay
Thank you for your help
— Reply to this email directly, view it on GitHub https://github.com/comprna/SUPPA/issues/10#issuecomment-1469603829, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADCZKB2WQXB3NNXNHFIIOD3W4GACXANCNFSM4EAT6UUQ . You are receiving this because you were mentioned.Message ID: @.***>
--
Thank you for your reply @EduEyras Actually, these are transcripts ids, I just remove some extensions this is how the original file looks like:
If you have an example of what you mean by transcript ids, please show me
Thank you!
Hi, sorry, I am confused your expression file seemed to use gene ids (ENSMUSG....) rather than transcript ids (ENSMUST....)
E.
On Wed, 15 Mar 2023 at 11:11, Olivier Feudjio @.***> wrote:
Thank you for your reply @EduEyras https://github.com/EduEyras Actually, these are transcripts ids, I just remove some extensions this is how the original file looks like: [image: image] https://user-images.githubusercontent.com/52349851/225353426-824b6828-6d1f-45c8-a9aa-834dc1f60309.png
If you have an example of what you mean by transcript ids, please show me
Thank you!
— Reply to this email directly, view it on GitHub https://github.com/comprna/SUPPA/issues/10#issuecomment-1470183439, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADCZKB3S3MWDR6VLD3H3UJDW4HL3LANCNFSM4EAT6UUQ . You are receiving this because you were mentioned.Message ID: @.***>
Yeah it's my mistake, I used ENSMUST.... but still got the same errors, for instance: ERROR:psiCalculator:transcript ENSMUST00000236635 not found in the "expression file". ERROR:psiCalculator:PSI not calculated for event ENSMUSG00000033768;SE:19:6505310-6513854:6513877-6522288:+.
Thank you for dedicating your time to this issue
In this case, it could be that ENSMUST00000236635 is not expressed, All the transcripts that are in the annotation and hence in the ioe file, but are not expressed, will give you that error. It should be normal. Do you get PSIs for the other events? best Eduardo
On Wed, 15 Mar 2023 at 11:49, Olivier Feudjio @.***> wrote:
Yeah it's my mistake, I used ENSMUST.... but still got the same errors, for instance: ERROR:psiCalculator:transcript ENSMUST00000236635 not found in the "expression file". ERROR:psiCalculator:PSI not calculated for event ENSMUSG00000033768;SE:19:6505310-6513854:6513877-6522288:+.
Thank you for dedicating your time to this issue
— Reply to this email directly, view it on GitHub https://github.com/comprna/SUPPA/issues/10#issuecomment-1470290651, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADCZKB6JGZNSJLHAV2JBQOTW4HQJVANCNFSM4EAT6UUQ . You are receiving this because you were mentioned.Message ID: @.***>
That is the issue, I got NA for all the transcripts
no psi values best
One possibility is a formatting problem in the input files, like hidden characters at the end of the line, or a missing header. E.
On Wed, 15 Mar 2023 at 11:59, Olivier Feudjio @.***> wrote:
That is the issue, I got NA for all the transcripts
no psi values best
— Reply to this email directly, view it on GitHub https://github.com/comprna/SUPPA/issues/10#issuecomment-1470306192, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADCZKB3ZO2TFXQJZ4FPQ56DW4HRNPANCNFSM4EAT6UUQ . You are receiving this because you were mentioned.Message ID: @.***>
The header is not missing and I don't think there are hidden characters at the end of the file Please find attached the two expression files I am using for my analysis if you don't mind.
Thank you expression files.zip
Could you please send me your command line and a sample of your .ioe file
thanks
E.
On Wed, 15 Mar 2023 at 14:21, Olivier Feudjio @.***> wrote:
The header is not missing and I don't think there are hidden characters at the end of the file Please find attached the two expression files I am using for my analysis if you don't mind.
Thank you expression files.zip https://github.com/comprna/SUPPA/files/10983398/expression.files.zip
— Reply to this email directly, view it on GitHub https://github.com/comprna/SUPPA/issues/10#issuecomment-1470537449, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADCZKBYLVPOMJBSBFQLQ35LW4ICBHANCNFSM4EAT6UUQ . You are receiving this because you were mentioned.Message ID: @.***>
Yes sure, thank you mm10_SE_strict.zip
And here is the command line:
suppa.py psiPerEvent --ioe-file mm10_SE_strict.ioe --expression-file spc.tsv --output-file ./spc_SE
Hello @EduEyras, Just checking if you have had a little time to look at my request, please?
Thank you
Hello SUPPA crew!
I've been having some issues with getting the IDs to match up between the abundance file and my ioe file. Here's what I'm working with...
GCF_000337935.1_Cliv_1.0_genomic.gtf
ColLiv.allevents.ioe
v1_iso_TPM.txt
when I run the
psiPerEvent
function of SUPPA, I get numerous errors, all indicating a transcript ID was not found in the "expression file".I did invoke the
-p
flag when generating events, as these are RefSeq annotations. Is there something funky going on here? Or is it as simple, yet perhaps bizarre, as a subset of transcripts (~3.5k of ~20K) are present in the gtf, yet absent from the transcriptome? Any thoughts?