functional site matrix generation/clustering (2/2)

jharenza commented 4 months ago

Purpose/implementation Section

What scientific question is your analysis addressing?

What was your approach?

What GitHub issue does your pull request address?

Directions for reviewers. Tell potential reviewers what kind of feedback you are soliciting.

Which areas should receive a particularly close look?

Is there anything that you want to discuss further?

Is the analysis in a mature enough form that the resulting figure(s) and/or table(s) are ready for review?

Results

What types of results are included (e.g., table, figure)?

What is your summary of the results?

Reproducibility Checklist

[ ] The dependencies required to run the code in this pull request have been added to the project Dockerfile.

Documentation Checklist

[ ] This analysis module has a README and it is up to date.
[ ] The analytical code is documented and contains comments.

komalsrathi commented 4 months ago

Hi!

These are the changes that I have made:

1) I fixed the lspline_clustering.R to reintroduce DGCA::filterGenes calls because it was calling the library anyway. It also seems like the Dockerfile has DGCA installed without issues.

If you want to remove the DGCA implementation, maybe keep the `filter_expr` argument as is but add some other functionality [here](https://github.com/d3b-center/pbta-splicing/blob/new-cluster/analyses/clustering_analysis/util/lspline_clustering.R#L111-L116).

If you don't want to keep the filter_expr argument at all, you will just have to make sure that the changes are propagated here, here and here to run the clustering code without issues.

In my case, I was able to run everything smoothly so personally I don't think it is required to remove the library + corresponding function calls.

2) I tested the clustering on a subset of functional sites i.e. 1000 sites and 100 samples using Docker. As mentioned above, it ran smoothly without issues. The rds file is here: pan_cancer_splicing_SE.gene.rds. You won't need to do any changes and just run the bash script run-functional-clustering.sh as is.

jharenza commented 4 months ago

thanks @komalsrathi - running now

komalsrathi commented 4 months ago

Reduced features from 68238 to 5630 by using OncoKB Oncogenes and TSGs per discussion on slack.

d3b-center / pbta-splicing