fraterenz / ecdna-evo

Evolutionary models of extrachromosomal DNA (ecDNA).
0 stars 0 forks source link

Fit single cell FISH microscopy #6

Closed fraterenz closed 2 years ago

fraterenz commented 2 years ago

Fit FISH data but first implement the other statistics since "the copy number is a bit higher and we will need larger population sizes for the simulation". Understand if the cluster can be used with this version of the code, or we need to change the ecDNA distribution vector #12

fraterenz commented 2 years ago

The cluster can manage 10^8 cells with 1000 ABC runs with 22 cores and 96G of memory in less than 1h (job id 2135934 in https://stats.hpc.qmul.ac.uk). With more than 10^8, the number of cores must be reduce since there is a huge memory usage, see #17

fraterenz commented 2 years ago

We need to get a copy number of 49 (max values of ecDNA copy number available in the data UCLA_Nathanson_FISH_Quantification.html). We have also the mean which is between 10 and 19, see again same html file.

How can we now the number of nminus cells? You don't know how to fit the ecDNA distribution without nminus cells. Knowing nminus cells is also useful for the frequency and the entropy, so for now we can only use the mean.

fraterenz commented 2 years ago

Fitting FISH data

Use subsampling to use the simulations to infer the most probable fitness coefficients from the FISH data.

Goal

Make FISH data comparable to simulations.

Requirements

  1. High copy numbers are required (copy number of 49 is present in the data): simulate on the cluster 10^9 cells
  2. The resolution of the FISH data is low (for one patient there are few cells, less than 100): implement the subsampling method

Subsampling implementation

The idea is to itertools::counts to get the number of ecDNA copies counts (create the classes and their counts). Zip counts to create the input required by the method choose_multiple_weighted found in the trait rand::seq::SliceRandom. NOPE This does not keep the proportions, but samples according to a weighted uniform distribution. Should we use the hypergeometric distribution?

fraterenz commented 2 years ago

Subsampling works for 10^6 cells subsampling the tumours with 10^4 cells, but when trying using a smaller sample size of 100 cells, strong fitness did not work (but f=1 did). Lower the threshold for small ecDNA distributions.