Closed lucsnip closed 8 months ago
What is the sequencing coverage, and is it low? Please review the results of the stat command with the following syntax:
cnvpytor -root <pytor file> -stat <bin size>
-Arijit
Hi Arijit,
The coverage should be 30x.
The stat output is quite long. What am I looking for? I notice it is giving these warnings occasionally while the program is running:
cnvpytor.utils - WARNING - Problem with fit: Runtime Error. Using mean and std instead fitting parameters!
cnvpytor.utils - WARNING - Problem with fit: insufficient data points. Using mean and std instead fitting parameters!
Yes, you are correct. The fitting for rd didn't work properly for some of those bins. Please examine the fitting curves in 'view mode'.
cnvpytor -root <pytor file> -view <bin size>
cnvpytor> rdstat
This could assist in explaining the reason behind the misfit.
Here is the output for bin size 100
cnvpytor -conf mm10_ref_conf.py -root B6MaleKidney_mm10_masked_rd.pytor -view 100
2024-01-08 18:30:58,011 - cnvpytor.genome - INFO - Reading configuration file 'mm10_ref_conf.py'.
2024-01-08 18:30:58,011 - cnvpytor.genome - INFO - Importing reference genome data: 'mm10'.
cnvpytor> rdstat
2024-01-08 18:31:13,604 - cnvpytor.viewer - INFO - RD stat for Autosomes: 1.11 +- 0.34
2024-01-08 18:31:13,629 - cnvpytor.viewer - INFO - RD stat for X/Y: 1.06 +- 0.26
2024-01-08 18:31:13,650 - cnvpytor.viewer - INFO - RD stat for Mitochondria: 137.71 +- 29.64
2024-01-08 18:31:13,650 - cnvpytor.viewer - INFO - RD stat for Mitochondria - number of mitochondria per cell: 249.16 +- 91.84
Additionally, my data is PacBio long read DNA sequence. Is that likely to be a source of issues with fit?
Could you double-check if the sequencing coverage is 30x? If so, can you confirm whether this coverage applies to the entire genome or a targeted panel?
-Arijit
Yes, the sequencing coverage is 30x, and whole genome, not targeted. Is it possible the default settings are not optimized for long reads?
I've just come to the realization that having a bin size greater than the read length is necessary for it to work to some extent. As you are using long read data, the small bin size i.e., 100, 1k, 10k is not working properly. Relying solely on this read depth-based cnvpytor approach may lead to the oversight of small events and misses many details.
I would recommend incorporating a BAF-based approach by importing variant information. Following this, you can cross-reference the results with the read depth-based data to confirm events.
That makes sense. I did notice that the fit warnings are still happing for 100k bin size, however.
I have run a RD analysis on hi-fi long read sequence data for bin sizes 100bp, 1kb, 10kb, and 100kb. The 100kb plot looks reasonable, but the other are looking strange. The 10kb plot has a very broad variance compare with 100kb. The others just look bad. I believe I have followed all the instructions properly, so I am not sure what would cause this. I have included the images below. 100kb:
10kb
1kb:
100bp: