Nik-Zainal-Group / signature.tools.lib

R package containing useful functions for mutational signature analysis
Other
80 stars 26 forks source link

Running the algorithm for a group of samples #5

Closed beginner984 closed 4 years ago

beginner984 commented 4 years ago

Hi

My samples are two groups of responders to chemotherapy and non_responders

Ideally I would need two pictures from these groups to compare the signature like below

Non_responders svg Responders svg

Is the a way to get one figure for a group of samples?

I have my samples as .vcf

Thanks in advance

andreadega commented 4 years ago

Hi there,

Currently we do not have plots of signature bootstrap fit that show multiple samples at the same time. You will have to generate the plots yourself. For signature fit we only have plots that show the samples fit individually for each sample.

beginner984 commented 4 years ago

Good afternoon

Now my biggest ambiguity is: How people decide about a bunch of signatures fitted to their data? For instance my samples are from oesophageal adenocarcinoma pre treatment biopsies. I have selected three .vcf files randomly from my data and I ran signal on them individually by checking Esophagus as organ where I saw different thing in each samples (please have a look at the figures).

VCF1 VCF2 VCF3

I then combined Chromosome, Position, reference and mutated allele column from vcf files from all samples in a tabular .tsv files and run signal on that where I noticed a gap of what individual samples show versus combined data

all SAMPLES

Now my question please is, if you where on my place, how would you select let's say the most relevant signatures describing my samples?

Sorry to make trouble for you and thank you in advance

andreadega commented 4 years ago

Hi again,

I see you are using SIGNAL. Currently SIGNAL supports the analysis of a limited number of samples at once using multisample VCF or uploading samples VCF one by one. To perform signature fit on a large number of samples I advise you to use our R package.

You can see Example01 for an example of how to use your vcf files to create mutational catalogues and perform signature fit using the function SignatureFit_withBootstrap_Analysis.

If you have whole genome data, I would not combine multiple samples into one mutational catalogue, because each sample can have different signatures. You can see how your first sample above has predominant sig 17, while the second sample sig 18 and the third 1+5. This indicates that these three samples possibly belong to different phenotypic groups.

In general, looking at your plots, it seems that the mutational burden for the individual samples is quite high, so it is good practice first to make sure that the mutation calls are set to high specificity and that artefacts and SNPs from germline are removed.

I hope this was helpful. You can find more details about our signature fit procedure in the methods of our Nature Cancer paper.

beginner984 commented 4 years ago

Thank you so much

Very helpful of clearing me about possible source of weird signatures I am getting which don't biologically sense in esophagus adenocarcinoma (I am getting signatures 8 and 9 quite more)

I am working on whole genome sequencing of pre treatment biopsies

People say that fitting may give false positive so I think that is more feasible to first extract signatures and then look what the extracted signatures are and finally showing the exposure proportion of signatures across samples.

andreadega commented 4 years ago

I think your approach is a very good one. If you have many samples, performing signature extraction is a good way to find which signatures are present, and then just fit those signatures.

If you want to use our package for signature extraction, I put some advice in the README FAQ.

Also remember to check individual samples, because some signatures are rare and could be present in one or two samples only, and could be missed by the general extraction. You can identify these samples, as they would have an higher error than ther others. For this you can use the function unexplainedSamples in our package. Looking at the difference between model and original catalogue can give you a clue of what extra signature may be there, if any.