ablab / VerityMap

GNU General Public License v3.0
30 stars 5 forks source link

Plotting functions fail after run on diploid human genome #29

Open jeizenga opened 1 year ago

jeizenga commented 1 year ago

I ran with this command:

python3 main.py --reads alpha_reads.fastq.gz -o verity_map_output_alpha -t 16 -d hifi-diploid assembly.haplotype1.fasta assembly.haplotype2.fasta

According to the veritymap.log the run completed successfully:

02:09:13 59.8Gb  INFO: Finished exporting long (>= 5000 bp) regions without solid k-mers to "/public/groups/vg/jeizenga/centromere/reads/PAN010/verity_map_output_alpha/veritymap/no_solid_kmers.bed"
02:09:13 59.8Gb  INFO: Computing chains and sam records...
65:45:26 59.8Gb  INFO: Finished outputting chains to "/public/groups/vg/jeizenga/centromere/reads/PAN010/verity_map_output_alpha/veritymap/chains.tsv" and sam records to "/public/groups/vg/jeizenga/centromere/reads/PAN010/verity_map_output_alpha/veritymap/alignments.sam"
65:46:31 59.8Gb  INFO: Thank you for using VerityMap!

However, after the logger gave that output, Python threw an error apparently linked to this line in the plotting function: https://github.com/ablab/VerityMap/blob/d24aa797be9c977dbcb9164ecfe18b3af6e4a026/veritymap/py_src/reporting.py#L36

Traceback (most recent call last):                                                                                       
File "/public/groups/vg/jeizenga/GitHub/VerityMap/veritymap/main.py", line 61, in <module>                               
main()                                                                                                               
File "/public/groups/vg/jeizenga/GitHub/VerityMap/veritymap/venv/lib/python3.10/site-packages/click/core.py", line 1130, in __call__                                                                                                            
return self.main(*args, **kwargs)                                                                                    
File "/public/groups/vg/jeizenga/GitHub/VerityMap/veritymap/venv/lib/python3.10/site-packages/click/core.py", line 1055, in main                                                                                                                
rv = self.invoke(ctx)                                                                                                
File "/public/groups/vg/jeizenga/GitHub/VerityMap/veritymap/venv/lib/python3.10/site-packages/click/core.py", line 1404, in invoke                                                                                                              
return ctx.invoke(self.callback, **ctx.params)                                                                       
File "/public/groups/vg/jeizenga/GitHub/VerityMap/veritymap/venv/lib/python3.10/site-packages/click/core.py", line 760, in invoke                                                                                                               
return __callback(*args, **kwargs)                                                                                   
File "/public/groups/vg/jeizenga/GitHub/VerityMap/veritymap/main.py", line 54, in main                                   
do(assemblies, reads_fname, datatype, out_dir, threads, no_reuse, is_careful)                                        
File "/public/groups/vg/jeizenga/GitHub/VerityMap/veritymap/../veritymap/py_src/mapper.py", line 154, in do              
make_plotly_html(assemblies, all_data, out_dir)                                                                      
ile "/public/groups/vg/jeizenga/GitHub/VerityMap/veritymap/../veritymap/py_src/reporting.py", line 36, in make_plotly_html                                                                                                                     
data['coverage'] = coverage[ref_name]                                                                              
KeyError: 'haplotype1-0000013'       

A feature of this run that might be relevant is that I tried to address the previously reported speed issues by enriching for reads containing alpha satellite sequences. It's conceivable that some contigs had no alpha satellite-containing reads map to them, in which case perhaps they weren't entered into this dictionary?

For what it's worth, this is a good example of why I suggested in my previous issue that the analysis module be accessible without re-running the mapping code, as the mapping took several days to complete.