Closed beginner984 closed 4 years ago
Hi there,
Currently we do not have plots of signature bootstrap fit that show multiple samples at the same time. You will have to generate the plots yourself. For signature fit we only have plots that show the samples fit individually for each sample.
Good afternoon
Now my biggest ambiguity is: How people decide about a bunch of signatures fitted to their data? For instance my samples are from oesophageal adenocarcinoma pre treatment biopsies. I have selected three .vcf files randomly from my data and I ran signal on them individually by checking Esophagus as organ where I saw different thing in each samples (please have a look at the figures).
I then combined Chromosome, Position, reference and mutated allele column from vcf files from all samples in a tabular .tsv files and run signal on that where I noticed a gap of what individual samples show versus combined data
Now my question please is, if you where on my place, how would you select let's say the most relevant signatures describing my samples?
Sorry to make trouble for you and thank you in advance
Hi again,
I see you are using SIGNAL. Currently SIGNAL supports the analysis of a limited number of samples at once using multisample VCF or uploading samples VCF one by one. To perform signature fit on a large number of samples I advise you to use our R package.
You can see Example01 for an example of how to use your vcf files to create mutational catalogues and perform signature fit using the function SignatureFit_withBootstrap_Analysis
.
If you have whole genome data, I would not combine multiple samples into one mutational catalogue, because each sample can have different signatures. You can see how your first sample above has predominant sig 17, while the second sample sig 18 and the third 1+5. This indicates that these three samples possibly belong to different phenotypic groups.
In general, looking at your plots, it seems that the mutational burden for the individual samples is quite high, so it is good practice first to make sure that the mutation calls are set to high specificity and that artefacts and SNPs from germline are removed.
I hope this was helpful. You can find more details about our signature fit procedure in the methods of our Nature Cancer paper.
Thank you so much
Very helpful of clearing me about possible source of weird signatures I am getting which don't biologically sense in esophagus adenocarcinoma (I am getting signatures 8 and 9 quite more)
I am working on whole genome sequencing of pre treatment biopsies
People say that fitting may give false positive so I think that is more feasible to first extract signatures and then look what the extracted signatures are and finally showing the exposure proportion of signatures across samples.
I think your approach is a very good one. If you have many samples, performing signature extraction is a good way to find which signatures are present, and then just fit those signatures.
If you want to use our package for signature extraction, I put some advice in the README FAQ.
Also remember to check individual samples, because some signatures are rare and could be present in one or two samples only, and could be missed by the general extraction. You can identify these samples, as they would have an higher error than ther others. For this you can use the function unexplainedSamples
in our package. Looking at the difference between model and original catalogue can give you a clue of what extra signature may be there, if any.
Hi
My samples are two groups of responders to chemotherapy and non_responders
Ideally I would need two pictures from these groups to compare the signature like below
Is the a way to get one figure for a group of samples?
I have my samples as .vcf
Thanks in advance