TheJacksonLaboratory / isopretGO

Isoform function prediction and interpretation
https://thejacksonlaboratory.github.io/isopretGO/
GNU General Public License v3.0
6 stars 2 forks source link

isopretGUI reports slightly different numbers for the same input on different machines #159

Closed hansenp closed 3 months ago

hansenp commented 1 year ago

We noticed that isopretGUI reports slightly different numbers of differentially expressed genes, differentially spliced genes, etc. for the same input file from HBA-Delas on different machines (table at the top of the analysis screen).

For file SRP119676_240.txt from the Zenodo repository with MD5 checksum 987bbc84fc2595a137f7eae2ae314013, the following numbers were reported on a Mac M2:

image

And on a Core i7 machine, the following numbers were reported:

image

In both cases the dmg file for isopretGUI was created using bash package.sh.

To narrow down the error further, it would be good if others could create such screenshots using the same file and post them here.

pnrobinson commented 10 months ago

This is worrisome, I will try to track this down. I am thinking it might be better to try and improve the modularity of the code for better testing.

hansenp commented 10 months ago

So far we have only tested on two machines and got different results. You could try to reproduce the error from above on your machine. Maybe that will give us a clue.

pnrobinson commented 10 months ago

@hansenp I tested the above examples on my M1 mac. Note sure if this is a mistake but you are showing different genes (CS and UQCRQ on top but GK and FGGY on the bottom). In any case I get the same numbers as shown above.

hansenp commented 10 months ago

I tested again with the same file (SRP119676_240.txt) on the same computer (M2) as above, but with the latest version of isopretGUI. Here is the resulting table with the summary statistics:

image

Only the values for Significant DGE GO Terms and Significant DAS GO Terms changed slightly. The remaining values are unchanged.

@pnrobinson Do you get the same results on your computer?

pnrobinson commented 3 months ago

This is what I see after compiling today with linux image Analysis performed on 2024-07-04T16:36Z Number of genes with annotated transcripts 34912 Number of of annotated transcripts 164776 RNA-Seq analysis method HBADEALS Number of annotated genes 17895 Number of GO terms used to annotate genes 16710 Number of GO terms used to annotate transcripts 16710 Number of annotated transcripts 85504 Number of HGNC gene entries 26698 Number of of interpro descriptions 40051 Number of interpro annotations 19083 Number of significantly differential isoforms 895 Number of significantly differential genes 1957 DAS study size 895 DAS population size 15792 DGE study size 1957 DGE population size 7693 Chosen FDR threshold 0.05 Probability threshold (expression) 0.25 Probability threshold (splicing) 0.19 [INFO] isoform function file /home/peter/data/isopret/isoform_function_list_mf.txt [INFO] Interpro domains file /home/peter/data/isopret/interpro_domains.txt [INFO] Interpro description file /home/peter/data/isopret/interpro_domain_desc.txt Input file SRP119676_240.txt

pnrobinson commented 3 months ago

I am getting different numbers for the genes (fold changes). Could we do this together to check @hansenp ?

pnrobinson commented 3 months ago

This was a different version, md5sum SRP119676_240.txt da1cf681165cfb4aae2b7a4dedfb3e77 SRP119676_240.txt

hansenp commented 3 months ago

HBA deals files downloaded from here: https://zenodo.org/records/6483996

MD5 (hbadeals_output/SRP119676_240.txt) = 987bbc84fc2595a137f7eae2ae314013

This is what I get on M3: image

hansenp commented 3 months ago

I used the jar file from the latest release:

https://github.com/TheJacksonLaboratory/isopretGO/releases/tag/1.3.2

and default parameters.

koehlek99 commented 3 months ago

I also tested with SRP119676_240.txt using the latest release, it's almost the same as yours @hansenp. One significant DGE GO term is missing

image

pnrobinson commented 3 months ago

Checking again with exactly the same freshly downloaded files, we have found the same results. This was most likely a version issue but seems to be working now.