AlexsLemonade / OpenPBTA-analysis

The analysis repository for the Open Pediatric Brain Tumor Atlas Project
Other
99 stars 68 forks source link

Updated analysis: Investigate handling of t-SNE in `transcriptomic-dimension-reduction` module #716

Open cbethell opened 4 years ago

cbethell commented 4 years ago

What analysis module should be updated and why?

As discovered in PR #704, the t-SNE results in analyses/transcriptomic-dimension-reduction/results do not appear to be reproducible as they change across operating systems (when ran with a seed set in Docker) but remain the same for each specific individual OS.

What changes need to be made? Please provide enough detail for another participant to make the update.

As noted on PR #704, the following has been determined:

That being said, when looking into this issue, one might look into:

What input data should be used? Which data were used in the version being updated?

No additional input data should be needed.

When do you expect the revised analysis will be completed?

~ 2 days

Who will complete the updated analysis?

Likely someone at the CCDL

sjspielman commented 4 years ago

The OS's in question are presumably mac vs linux, yes? A possible culprit could be clang vs gcc, or similar, based on personal experience only. Will start some sleuthing over here.

jashapiro commented 4 years ago

The OS's in question are presumably mac vs linux, yes? A possible culprit could be clang vs gcc, or similar, based on personal experience only. Will start some sleuthing over here.

Not quite so simply. As far as I am aware, the differences are all occurring on Macs, but more to the point it is all within Docker images, so compiler versions should be the same. Should. It's a mystery.

sjspielman commented 4 years ago

It's a mystery.

Until we solve, I will blame Bioconductor, solely for personal comfort.

sjspielman commented 4 years ago

I don't think the Docker version itself should make a difference, but for what it's worth I'm rebuilding my image now on this docker version -

Screen Shot 2020-06-11 at 9 28 09 AM

Edit: As needed, here are docker's release notes. I am using the most up-to-date version of Desktop released on 5/27/20. The score files in master were merged in on 4/4/20, so had to have used a different Docker release. It would be extra bad if docker itself is leading to this discrepancy, and I have to assume this is not the cause (otherwise too depressing). Still worth ruling out.

sjspielman commented 4 years ago

Appoach : two docker using the aforementioned Desktop Docker version, builds as:

docker build -t pbtarocker-cache --pull
docker build -t pbtarocker-nocache --pull --no-cache

They give the exact same values as one another, but their PCA and tSNE scores differ from master. UMAP is the same as master.

Conclusion: Likely unrelated to docker caching. This is expected and good. Next up: related to docker version? Will go download a slightly older Desktop docker and rebuild (from cache!) and check it out there. Goal: this also isn't the culprit. EDIT: My computer will not allow me to download any older version from March or earlier due to security risks. I can't investigate this one.