Closed hansenp closed 3 months ago
This is worrisome, I will try to track this down. I am thinking it might be better to try and improve the modularity of the code for better testing.
So far we have only tested on two machines and got different results. You could try to reproduce the error from above on your machine. Maybe that will give us a clue.
@hansenp I tested the above examples on my M1 mac. Note sure if this is a mistake but you are showing different genes (CS and UQCRQ on top but GK and FGGY on the bottom). In any case I get the same numbers as shown above.
I tested again with the same file (SRP119676_240.txt
) on the same computer (M2) as above, but with the latest version of isopretGUI. Here is the resulting table with the summary statistics:
Only the values for Significant DGE GO Terms and Significant DAS GO Terms changed slightly. The remaining values are unchanged.
@pnrobinson Do you get the same results on your computer?
This is what I see after compiling today with linux Analysis performed on 2024-07-04T16:36Z Number of genes with annotated transcripts 34912 Number of of annotated transcripts 164776 RNA-Seq analysis method HBADEALS Number of annotated genes 17895 Number of GO terms used to annotate genes 16710 Number of GO terms used to annotate transcripts 16710 Number of annotated transcripts 85504 Number of HGNC gene entries 26698 Number of of interpro descriptions 40051 Number of interpro annotations 19083 Number of significantly differential isoforms 895 Number of significantly differential genes 1957 DAS study size 895 DAS population size 15792 DGE study size 1957 DGE population size 7693 Chosen FDR threshold 0.05 Probability threshold (expression) 0.25 Probability threshold (splicing) 0.19 [INFO] isoform function file /home/peter/data/isopret/isoform_function_list_mf.txt [INFO] Interpro domains file /home/peter/data/isopret/interpro_domains.txt [INFO] Interpro description file /home/peter/data/isopret/interpro_domain_desc.txt Input file SRP119676_240.txt
I am getting different numbers for the genes (fold changes). Could we do this together to check @hansenp ?
This was a different version, md5sum SRP119676_240.txt da1cf681165cfb4aae2b7a4dedfb3e77 SRP119676_240.txt
HBA deals files downloaded from here: https://zenodo.org/records/6483996
MD5 (hbadeals_output/SRP119676_240.txt) = 987bbc84fc2595a137f7eae2ae314013
This is what I get on M3:
I used the jar file from the latest release:
https://github.com/TheJacksonLaboratory/isopretGO/releases/tag/1.3.2
and default parameters.
I also tested with SRP119676_240.txt using the latest release, it's almost the same as yours @hansenp. One significant DGE GO term is missing
Checking again with exactly the same freshly downloaded files, we have found the same results. This was most likely a version issue but seems to be working now.
We noticed that isopretGUI reports slightly different numbers of differentially expressed genes, differentially spliced genes, etc. for the same input file from HBA-Delas on different machines (table at the top of the analysis screen).
For file
SRP119676_240.txt
from the Zenodo repository with MD5 checksum987bbc84fc2595a137f7eae2ae314013
, the following numbers were reported on a Mac M2:And on a Core i7 machine, the following numbers were reported:
In both cases the
dmg
file for isopretGUI was created usingbash package.sh
.To narrow down the error further, it would be good if others could create such screenshots using the same file and post them here.