Closed jharenza closed 2 months ago
Hi!
These are the changes that I have made:
1) I fixed the lspline_clustering.R to reintroduce DGCA::filterGenes calls because it was calling the library anyway. It also seems like the Dockerfile has DGCA installed without issues.
If you want to remove the DGCA implementation, maybe keep the `filter_expr` argument as is but add some other functionality [here](https://github.com/d3b-center/pbta-splicing/blob/new-cluster/analyses/clustering_analysis/util/lspline_clustering.R#L111-L116).
If you don't want to keep the filter_expr
argument at all, you will just have to make sure that the changes are propagated here, here and here to run the clustering code without issues.
In my case, I was able to run everything smoothly so personally I don't think it is required to remove the library + corresponding function calls.
2) I tested the clustering on a subset of functional sites i.e. 1000 sites and 100 samples using Docker. As mentioned above, it ran smoothly without issues. The rds file is here: pan_cancer_splicing_SE.gene.rds. You won't need to do any changes and just run the bash script run-functional-clustering.sh as is.
thanks @komalsrathi - running now
Reduced features from 68238 to 5630 by using OncoKB Oncogenes and TSGs per discussion on slack.
Purpose/implementation Section
What scientific question is your analysis addressing?
What was your approach?
What GitHub issue does your pull request address?
Directions for reviewers. Tell potential reviewers what kind of feedback you are soliciting.
Which areas should receive a particularly close look?
Is there anything that you want to discuss further?
Is the analysis in a mature enough form that the resulting figure(s) and/or table(s) are ready for review?
Results
What types of results are included (e.g., table, figure)?
What is your summary of the results?
Reproducibility Checklist
Documentation Checklist
README
and it is up to date.