fritzsedlazeck / Spectre

Copy number caller for long read data including SNV utilization
MIT License
54 stars 3 forks source link

Spectre error with UNK chromosome #2

Closed tuannguyen8390 closed 1 year ago

tuannguyen8390 commented 1 year ago

Not really something extremely buggy, but currently with cattle we have a lot of unmapped chromosomes (NKLS02000...). These caused Spectre to fail (see below). Specifying --only-chr with numeric chromosome(s) will address the issue, but perhaps for a genome with "Chr01" it could also be potentially problematic.


spectre::INFO> refining cnv calls
/group/dairy/Tuan/Recessive_lethal/Long_read_seq_analysis/Spectre/Spectre/analysis/analysis.py:384: RuntimeWarning: invalid value encountered in scalar subtract
  scaf_cov - np.nanmedian(list(_cand.cov) + list(_cand_next.cov))
multiprocessing.pool.RemoteTraceback: 
"""
Traceback (most recent call last):
  File "/group/home/vicsuwd/anaconda3/lib/python3.8/multiprocessing/pool.py", line 125, in worker
    result = (True, func(*args, **kwds))
  File "/group/home/vicsuwd/anaconda3/lib/python3.8/multiprocessing/pool.py", line 48, in mapstar
    return list(map(*args))
  File "Spectre/spectre.py", line 117, in outside_spectre_worker
    worker.cnv_call()
  File "/group/dairy/Tuan/Recessive_lethal/Long_read_seq_analysis/Spectre/Spectre/spectreCNV.py", line 104, in cnv_call
    self.cnv_analysis.refine_cnv_calls(self.as_dev)  # set to self.as_dev
  File "/group/dairy/Tuan/Recessive_lethal/Long_read_seq_analysis/Spectre/Spectre/analysis/analysis.py", line 273, in refine_cnv_calls
    final_cnv_candidates = self.merge_candidates(candidates_cnv_list, each_chromosome)
  File "/group/dairy/Tuan/Recessive_lethal/Long_read_seq_analysis/Spectre/Spectre/analysis/analysis.py", line 285, in merge_candidates
    [n_merges, merged_candidates, _] = self.cnv_candidate_merge(candidates_cnv_list)
  File "/group/dairy/Tuan/Recessive_lethal/Long_read_seq_analysis/Spectre/Spectre/analysis/analysis.py", line 333, in cnv_candidate_merge
    any(end0 <= int(val) <= start1 for val in tup) for tup in self.metadata[cnv_cand.chromosome])
KeyError: 'NKLS02000880.1'
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "Spectre/spectre.py", line 500, in <module>
    main()
  File "Spectre/spectre.py", line 482, in main
    spectre_run.spectre_exe()
  File "Spectre/spectre.py", line 256, in spectre_exe
    results = pool.map(outside_spectre_worker, tuple(spectre_instructions))
  File "/group/home/vicsuwd/anaconda3/lib/python3.8/multiprocessing/pool.py", line 364, in map
    return self._map_async(func, iterable, mapstar, chunksize).get()
  File "/group/home/vicsuwd/anaconda3/lib/python3.8/multiprocessing/pool.py", line 771, in get
    raise self._value
KeyError: 'NKLS02000880.1'```
philippesanio commented 1 year ago

Hello @tuannguyen8390, this seems to be and issue with either the provided sequence or the .mdr file, as it tries to find blacklisted regions, e.g. "N" regions in the sequence.

Would you mind verifying for me if the NKLS02000880.1 is present in the fasta file? If it is, you could run Spectre for now without providing an .mdr file, which contains such information. Spectre will generate a new file for you, which you can later use for other samples that were aligned to the same reference sequence.

Cheers ~Philippe