Rerun analysis: comparative-RNASeq-analysis

jashapiro commented 3 years ago

What analysis module should be updated and why?

The addition of samples in v18 will require a rerun of the scripts in comparative-RNASeq-analysis to generate new data.

What changes need to be made? Please provide enough detail for another participant to make the update.

No changes in code should be required (beyond changes already made in #892), but results/rsem-tpm-stranded-gene_expression_outliers.tsv.gz will end up needing an update.

This will require running on a machine with >16GB memory. I am not sure the exact requirements, but my local machine was not sufficient.

The full analysis can be performed with the following command from within the OpenPBTA docker image:

bash analyses/comparative-RNASeq-analysis/run-comparative-RNAseq.sh

What input data should be used? Which data were used in the version being updated?

v18 expression matrixes

When do you expect the revised analysis will be completed?

After v18 release (or perhaps wait until after v19?)

Who will complete the updated analysis?

A CCDL member, most likely.

jaclyn-taroni commented 3 years ago

Hey @hbeale, we are happy to rerun this but we wanted to ask you if you had an idea of what computational resources (e.g., RAM, cores) were required to do so. Thank you!

cansavvy commented 3 years ago

I'm going to try to get this running on AWS today. I'll try out 128 GB and see how that goes.

jaclyn-taroni commented 3 years ago

We're going to wait until after v19 (#867) because of #862 - removing a sample from the dataset might (slightly) change the results!

hbeale commented 3 years ago

Thanks @cansavvy! If you need to ping us about the performance again, Ellen Kephardt is the most knowledgable about what resources are needed. I meant to ask her, and then forgot :)

jaclyn-taroni commented 3 years ago

Sounds good, thanks!

sjspielman commented 2 years ago

Noting v18 was re-run here https://github.com/AlexsLemonade/OpenPBTA-analysis/pull/892, but this has not been re-run with v19 as suggested. At the time of this comment, the current version is v21.

AlexsLemonade / OpenPBTA-analysis