abyzovlab / CNVpytor

a python extension of CNVnator -- a tool for CNV analysis from depth-of-coverage by mapped reads
MIT License
178 stars 26 forks source link

histogram command not completing #210

Closed lucsnip closed 7 months ago

lucsnip commented 7 months ago

I have generated GC content, mask, and conf file for my reference genome, and used these to generate an RD file. When I run his command, it calculates global statistics and then terminates without calculating histograms:

$ cnvpytor -conf mm39_ref_conf.py -root WSB_mm39_rd.pytor -his 100 1000 10000 100000
2024-01-30 18:38:51,768 - cnvpytor.genome - INFO - Reading configuration file 'mm39_ref_conf.py'.
2024-01-30 18:38:51,768 - cnvpytor.genome - INFO - Importing reference genome data: 'mm39'.
2024-01-30 18:38:51,809 - cnvpytor.root - INFO - Calculating global statistics.
2024-01-30 18:38:51,810 - cnvpytor.root - INFO - Calculating global statistics.
2024-01-30 18:38:51,811 - cnvpytor.root - INFO - Calculating global statistics.
2024-01-30 18:38:51,812 - cnvpytor.root - INFO - Calculating global statistics.

I am not sure what is happening here. I've used this reference genome and accompanying files on another dataset without this issue. Here is the ls output as well:

Filename 'WSB_mm39_rd.pytor'
----------------------------
File created: 2024-01-30 18:06 using CNVpytor ver 1.3.1

Chromosomes with RD signal: chr1, chr1_GL456210.1_random, chr1_GL456211.1_random, chr1_GL456212.1_random, chr1_GL456221.1_random, chr1_MU069434.1_random, chr1_GL456239.1_random, chr2, chr3, chr4, chr4_JH584295.1_random, chr5, chr5_JH584296.1_random, chr5_JH584297.1_random, chr5_JH584298.1_random, chr5_GL456354.1_random, chr5_JH584299.1_random, chr6, chr7, chr7_GL456219.1_random, chr8, chr9, chr10, chr11, chr12, chr13, chr14, chr15, chr16, chr17, chr18, chr19, chrX, chrX_GL456233.2_random, chrY, chrY_JH584300.1_random, chrY_JH584301.1_random, chrY_JH584302.1_random, chrY_JH584303.1_random, chrUn_GL456367.1, chrUn_GL456378.1, chrUn_GL456381.1, chrUn_GL456382.1, chrUn_GL456383.1, chrUn_GL456385.1, chrUn_GL456390.1, chrUn_GL456392.1, chrUn_GL456394.1, chrUn_GL456359.1, chrUn_GL456360.1, chrUn_GL456396.1, chrUn_GL456372.1, chrUn_GL456387.1, chrUn_GL456389.1, chrUn_GL456370.1, chrUn_GL456379.1, chrUn_GL456366.1, chrUn_GL456368.1, chrUn_JH584304.1, chrUn_MU069435.1, chrM

Chromosomes with SNP signal: 

Using reference genome: mm39 [ GC: yes, mask: yes ]

Chromosomes with RD histograms [bin sizes]:  []

Chromosomes with SNP histograms [bin sizes]:  []

Chromosome lengths: {'chr1': '195154279', 'chr1_GL456210.1_random': '169725', 'chr1_GL456211.1_random': '241735', 'chr1_GL456212.1_random': '153618', 'chr1_GL456221.1_random': '206961', 'chr1_MU069434.1_random': '8412', 'chr1_GL456239.1_random': '40056', 'chr2': '181755017', 'chr3': '159745316', 'chr4': '156860686', 'chr4_JH584295.1_random': '1976', 'chr5': '151758149', 'chr5_JH584296.1_random': '199368', 'chr5_JH584297.1_random': '205776', 'chr5_JH584298.1_random': '184189', 'chr5_GL456354.1_random': '195993', 'chr5_JH584299.1_random': '953012', 'chr6': '149588044', 'chr7': '144995196', 'chr7_GL456219.1_random': '175968', 'chr8': '130127694', 'chr9': '124359700', 'chr10': '130530862', 'chr11': '121973369', 'chr12': '120092757', 'chr13': '120883175', 'chr14': '125139656', 'chr15': '104073951', 'chr16': '98008968', 'chr17': '95294699', 'chr18': '90720763', 'chr19': '61420004', 'chrX': '169476592', 'chrX_GL456233.2_random': '559103', 'chrY': '91455967', 'chrY_JH584300.1_random': '182347', 'chrY_JH584301.1_random': '259875', 'chrY_JH584302.1_random': '155838', 'chrY_JH584303.1_random': '158099', 'chrUn_GL456367.1': '42057', 'chrUn_GL456378.1': '31602', 'chrUn_GL456381.1': '25871', 'chrUn_GL456382.1': '23158', 'chrUn_GL456383.1': '38659', 'chrUn_GL456385.1': '35240', 'chrUn_GL456390.1': '24668', 'chrUn_GL456392.1': '23629', 'chrUn_GL456394.1': '24323', 'chrUn_GL456359.1': '22974', 'chrUn_GL456360.1': '31704', 'chrUn_GL456396.1': '21240', 'chrUn_GL456372.1': '28664', 'chrUn_GL456387.1': '24685', 'chrUn_GL456389.1': '28772', 'chrUn_GL456370.1': '26764', 'chrUn_GL456379.1': '72385', 'chrUn_GL456366.1': '47073', 'chrUn_GL456368.1': '20208', 'chrUn_JH584304.1': '114452', 'chrUn_MU069435.1': '31129', 'chrM': '16299'}
arpanda commented 7 months ago

Would you mind to run the -his step in debug mode. i.e., -v d, The debug log might help to identify the issue.

-Arijit

lucsnip commented 7 months ago

Yes, here is the debug log:

2024-02-06 15:39:46,230 - cnvpytor - DEBUG - Start logging...
2024-02-06 15:39:46,230 - cnvpytor.genome - DEBUG - Checking reference genome resource files.
2024-02-06 15:39:46,230 - cnvpytor.genome - INFO - Reading configuration file 'mm39_ref_conf.py'.
2024-02-06 15:39:46,231 - cnvpytor.genome - INFO - Importing reference genome data: 'mm39'.
2024-02-06 15:39:46,231 - cnvpytor.root - DEBUG - App class init: filename 'WSB_mm39_rd.pytor'; max cores 8.
2024-02-06 15:39:46,231 - cnvpytor.io - DEBUG - Opening h5 file 'WSB_mm39_rd.pytor'
2024-02-06 15:39:46,262 - cnvpytor.io - DEBUG - File 'WSB_mm39_rd.pytor' successfully opened.
2024-02-06 15:39:46,272 - cnvpytor.root - DEBUG - Using GC content from database for reference genome 'mm39'.
2024-02-06 15:39:46,272 - cnvpytor.io - DEBUG - Opening h5 file '/storage/E_drive/Snipes/genome/mm39_gc_ap.pytor'
2024-02-06 15:39:46,386 - cnvpytor.io - DEBUG - File '/storage/E_drive/Snipes/genome/mm39_gc_ap.pytor' successfully opened in read-only mode.
2024-02-06 15:39:46,386 - cnvpytor.root - DEBUG - Using strict mask from database for reference genome 'mm39'.
2024-02-06 15:39:46,386 - cnvpytor.io - DEBUG - Opening h5 file '/storage/E_drive/Snipes/genome/mm39_mask_file.pytor'
2024-02-06 15:39:46,386 - cnvpytor.io - DEBUG - File '/storage/E_drive/Snipes/genome/mm39_mask_file.pytor' successfully opened in read-only mode.
2024-02-06 15:39:46,437 - cnvpytor.root - INFO - Calculating global statistics.
2024-02-06 15:39:46,438 - cnvpytor.root - INFO - Calculating global statistics.
2024-02-06 15:39:46,439 - cnvpytor.root - INFO - Calculating global statistics.
2024-02-06 15:39:46,440 - cnvpytor.root - INFO - Calculating global statistics.
2024-02-06 15:39:46,441 - cnvpytor.io - DEBUG - Closing h5 file 'WSB_mm39_rd.pytor'
2024-02-06 15:39:46,441 - cnvpytor.io - DEBUG - Closing h5 file '/storage/E_drive/Snipes/genome/mm39_gc_ap.pytor'
2024-02-06 15:39:46,441 - cnvpytor.io - DEBUG - Closing h5 file '/storage/E_drive/Snipes/genome/mm39_mask_file.pytor'
arpanda commented 7 months ago

Would you mind sharing a sample and reference genome related file via email. I will check in details.

Note: If you would like to avoid typing -conf REL_PATH/example_ref_genome_conf.py each time you run cnvpytor, you can create an bash alias or make configuration permanent by copying example_ref_genome_conf.py to ~/.cnvpytor/reference_genomes_conf.py.

-Arijit

lucsnip commented 7 months ago

Ok, I've figured it out. There was a mismatch between the bam header and the gc content and mask files. I fixed the header and remade those files, and the histogram command now runs as it should.